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1 

LGMD gene 

The invention relates to the isolated gene coding for a calcium dependent 
protease belonging to the CalpaYn family which, when it is mutated, is a cause of 
a disease called Limb-Girdle Muscular Dystrophy (LGMD). 

The term limb-girdle muscular dystrophy (LGMD) was first proposed by 
Walton and Nattrass (1954) as part of a classification of muscular dystrophies. 
LGMD is characterised by progressive symmetrical atrophy and weakness of the 
proximal limb muscles and by elevated serum creatine kinase. Muscle biopsies 
demonstrate dystrophic lesions and electromyograms show myopathic features. 
The symptoms usually begin during the first two decades of life and the disease 
gradually worsens, often resulting in loss of walking ability 10 or 20 years after 
onset (Bushby, 1994). Yet, the precise nosological definition of LGMD still 
remains unclear. Consequently, various neuromuscular diseases such as 
facioscapulohumeral, Becker muscular dystrophies and especially spinal 
muscular atrophies have been occasionally classified under this diagnosis. For 
example, a recent study (Arikawa et al., 1991) reported that 17% (out of 41) of 
LGMD patients showed a dystrophinopathy. These issues highlight the difficulty 
in undertaking an analysis of the molecular and genetic defect(s) involved in this 
20 pathology. 

Attempts to identify the genetic basis of this disease go back over 35 years. 
Morton and Chung (1959) estimated that "the frequency of heterozygous carrier 
... is 16 per thousand persons" The same authors also stated that "the 
segregation analysis gives no evidence on whether these genes in different 
25 families are allelic or at different loci". Both autosomal dominant and recessive 
transmission have been reported, the latter being more common with an 
estimated prevalence of 1Q-5 (Emery, 1991). The localisation of a gene for a 
recessive form on chromosome 15 (LGMD2A, MIM 253600; Beckmann et al., 
1991) provided the definitive proof that LGMD is a specific genetic entity. 
Subsequent genetic analyses confirmed this chromosome 15 localisation (Young 
et al., 1992; Passos-Bueno et al., 1993), the latter group demonstrating genetic 
heterogeneity of this disease. Although a recent study localised a second mutant 
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gene to chromosome 2 (LGMD2B, MIM 253601; Bashir et aL, 1994), there is 
evidence that at least one other locus can be involved. 

Genetic analyses of the LGMD2 kindreds revealed unexpected findings. 
First genetic heterogeneity was demonstrated in the highly inbred Indiana Amish 

5 community. Second although the Isle of la Reunion families were thought to 
represent a genetic isolate, at least 6 different disease haplotypes were 
observed , providing evidence against the hypothesis of a single founder effect 
(Beckmann et al. f 1991) in this inbred population. 

The nonspecific nosological definition, the relatively low prevalence and 

10 genetic heterogeneity of this disorder limit the number of families which can be 
used to restrict the genetic boundaries of the LGMD2A interval. Cytogenetic 
abnormalities, which could have helped to focus on a particular region, have not 
been reported. Immunogenetic studies of dystrophin-associated proteins 
(Matsumura et al., 1993) and cytoskeletal or extracellular matrix proteins such as 

is a merosin (Tome et aL, 1994) failed to demonstrate any deficiency. In addition, 
there is no known specific physiological feature or animal model that could help 
to identify a candidate gene. Thus, there is no alternative to a positional cloning 
strategy. 

It is established that the LGMD2 chromosomal region is localized on 
20 chromosome 15 as 15q15.1 - 15q21.1 region ( Fougerousse et al., 1994). 

Construction and analysis of a 10-12 Mb YAC contig (Fougerousse et aL, 
1994) permitted the mapping of 33 polymorphic markers within this interval and 
to further narrow the LGMD2A region to between D15S514 and D15S222. 
Furthermore, extensive analysis of linkage disequilibrium suggested a likely 
25 position for the gene in the proximal part of the contig. 

The invention results from the construction of a partial cosmid map and the 
screening by cDNA selection (Lovett et al. t 1991 ; Tagle et al. t 1993) for muscle- 
expressed sequences encoded by this interval led to the identification of a 
number of potential candidate genes. One of these, previously cloned by 
30 Sorimachi et aL (1989), encodes a muscle specific protein, nCL1 (novel Calpain 
Large subunit 1), which belongs to the calpain family (CANP, calcium-activated 
neutral protease; EC 3.4.22.17), and appeared to be a functional candidate gene 
for this disease. 



Calpains are non-lysosomal intracellular cysteine proteases which require 
calcium for their catalytic activities (for a review see Croall D.E. et al, 1991). The 
mammalian calpains include two ubiquitous proteins CANP1 and CANP2 as well 
as tissue-specific proteins. In addition to the muscle specific nCL1, stomach 
specific nCL2 and nCL2" proteins have also been described; these are derived 
from the same gene by alternative splicing. The ubiquitous enzymes -consist of 
heterodimers with distinct large subunits associated with an common small 
subunit ; the association of tissue-specific large subunits with a small subunit 
has not yet been demonstrated. The large subunits of calpains can be 
subdivided into 4 protein domains. Domains I and III, whose functions remain 
unknown, show no homology with known proteins. Domain I. however, seems 
important for the regulation of the proteolytic activity. Domain II shows similarity 
with other cysteine proteases, sharing histidine, cysteine and asparagine 
residues at its active sites. Domain IV comprises four EF-hand structures which 
are potential calcium binding sites. In addition, three unique regions with no 
known homology are present in the muscle-specific nCL1 protein, namely NS, 
IS1 and IS2, the latter containing a nuclear translocation signal. These regions 
may be important for the muscle specific function of nCL1 . 

It is usually accepted that muscular dystrophies are associated with excess 
or deregulated calpains, and all the known approaches for curing these diseases 
are the use of antagonists of these proteases ; examples are disclosed in EP 
359309 or EP 525420. 

The invention results from the finding that, on the opposite to all these 
hypothesis, the LGMD2 disease is strongly correlated to the defect of a calpain 
which is expressed in healthy people. 

The invention relates to the nucleic acid sequence such as represented in 
Figure 2 coding for a Ca~ dependent protease, or calpaTn, which is involved in 
LGMD2 disease, and more precisely LGMD2A. It also relates to a part of this 
sequence provided it is able to code for a protein having a calcium-dependent 
protease activity involved in LGMD2, or a sequence derived from one of the 
above sequences by substitution, deletion or addition of one or more nucleotides 
provided that said sequence is still coding for said protein, all the nucleic acids 
yielding a sequence complementary to a sequence as defined above. 




4 

The genomic organisation of the human nCL1 gene has been determined by 
the inventors, and consists of 24 exons and extends over 40 kb as represented 
in Figure 8, and is also a part of the invention. About 35 kb of this gene have 
been sequenced. A systematic screening of this gene in LGMD2A families led to 
5 the identification of 14 different mutations, establishing that a number of 
independent mutational events in nCL1 are responsible for LGMD2A. 
Furthermore, this is the first demonstration of a muscular dystrophy resulting 
from an enzymatic rather than a structural defect. 

In the present specification, CANP3 means the protein which is a Ca** 
10 dependent protease, or calpain, and coded by the nCL1 gene on chromosome 
15. 

The invention relates also to a protein, called CANP3, consisting in the 
amino acid sequence such as represented in figure 2 and which is involved, 
when mutated, in the LGMD2 disease. 
15 The cDNA of the gene coding for CANP3, which is coding for the protein, is 

also represented in Figure 2, and is a part of the invention. 

The protein coded by this DNA is CANP3, a calcium-dependent protease 
belonging to the Calpain family. 

Are also included in the present invention the nucleic acid sequences 
20 derived from the cDNA of Figure 2 by one or more substitutions, deletions, 
insertions, or by mutations in 5' or 3' non coding regions or in splice sites, 
provided that the translated protein has the protease, calcium-dependent 
activity, and when mutated, induce LGMD2 disease. 

The nucleic acid sequence encoding the protein might be DNA or RNA and 
25 be complementary to the nucleic and sequence represented in Figure 2. 

The invention also relates to a recombinant vector including a DNA 
sequence of the invention, under the control of a promoter allowing the 
expression of the calpain in an appropriate host cell. 

A procaryotic or eucaryotic host cell transformed by or transfected with a 
30 DNA sequence comprising all or part of the sequence of Figure 2 is a part of the 
invention. 

Such a host cell might be either : 



- a cell which is able to secrete the protein and, this recombinant protein 
might be used as a drug to treat the LGMD2, or 

- a packaging cell line transfected by a viral or retroviral vector ; the cell 
lines bearing recombinant vector might be used as a drug for gene therapy of 
LGMD2. 

All the systems used today for gene therapy including adenoviruses and 
retroviruses and others described for example in « I'ADN medicament », (John 
Libbey, Eurotext, 1 993), and bearing one of the DNA sequence of the invention 
are included herein by reference. 

The examples hereunder and attached figures indicate how the structure of 
the gene was established, and how relationship between the gene and the 
LGMD was established. 

Legend of the figures : 
Figure 1 : 

A) Genomic organisation of the nCL1 gene 

The gene covers a 40 kb region of which 35 were sequenced (Accession 
number pending). Introns and exons are drawn to scale, the latter being 
indicated by numbered vertical bars. The first intron is the largest one and 
remains to be fully sequenced. Position of intragenic microsatellites are indicated 
by asterisks. Arrows indicate the orientation of Alu (closed) and of Mer2 (greyed) 
repeat sequences. 

B) EcoRl restriction map 

An EcoRl (E) restriction map of this region was established with the help 
of cosmids from this region. The location of nCL1 gene is indicated as a black 
bar . The size of the corresponding fragments are indicated and are underlined 
when determined by sequence analysis. 

C) Cosmid map of the nCL1 gene region. 

Cosmids were from a cosmid library constructed by subcloning YAC 
774G4 (Richard in preparation) and are presented as lines. Dots on lines 
indicate positive STSs (indicated in boxed rectangles). A minimum of three 
cosmids cover the entire gene. T3.T7 




6 

Figure 2: Sequence of the human nCL1 cDNA (B) , and the flanking 5' (A) and 3' 
(C) genomic regions. 

A) and C) The polyadenylation signal and putative CAAT, TATAA sites are 
boxed. Putative Sp1 (position -477 to -472), MEF2 binding sites (-364 to -343) 

5 and CArG box (-685 to -672) are in bold. The Alu sequence present in the 5' 
region is underlined. 

B) The corresponding amino acids are shown below the sequence. The coding 
sequence between the ATG initiation codon and the TGA stop codon is 2466 bp, 
encoding for a 821 amino acid protein. The adenine in the first methionine codon 

10 has been assigned position 1. Locations of introns within the nCL1 gene are 
indicated by arrowheads. Nucleotides which differ from the previously published 
ones are indicated by asterisks. 

Figure. 3 : Alignments of amino acid sequences of the muscle-specific catpains. 

The human nCL1 protein is shown on the first line. The 3 muscle-specific 

15 sequences (NS, IS1 and IS2) are underlined. The second line corresponds to the 
rat sequence (Accession no P). The third and fourth lines show the deduced 
amino acid sequences encoded by pig and bovine Expressed Sequences 
Tagged (GenBank accession no U05678 and no U07858, respectively). The 
amino acids residues which are conserved among all known members of the 

20 calpains are in reverse letters. A period indicates that the same amino acid is 
present in the sequence. Letters refer to the variant amino acid found in the 
homologous sequence. Position of missense mutations are given as numbers 
above the mutated amino acid. 

Figure 4: Distribution of the mutations along nCL1 protein structure. 
25 A) Positions of the 23 introns are indicated by vertical bars in relation to the 

corresponding amino acid coordinates. 

B) The nCL1 protein is depicted showing the four domains (I, II, III, IV) and 

the muscle specific sequences (NS, IS1 and IS2). The position of missense 

mutations within nCL1 domain are indicated by black dots. The effect of 
30 nonsense and frameshift mutations are illustrated as truncated lines, 

representing the extent of protein synthesised. Name of the corresponding 

families are indicated on the left of the line. The out of frame ORF is given by 

hatched lines. 



Figure 5: Northern blot hybridisation of a nCL1 clone 

A mRNA blot (Clontech) containing 2 ug of poly(A)+ RNA from each of 
eight human tissues was hybridised with a nCL1 genomic clone spanning exons 
20 and 21. The latter detects a 3.6 kb mRNA present only in a line 
corresponding to the skeletal muscle mRNA. 

Figure 6: Representative mutations identified by heteroduplex analysis. 

Examples of mutation screening by heteroduplex analysis. Pedigree B505 

shows the segregation of two different mutations in exon 22. 

Figure 7: Homozygous mutations in the nCL1 gene 

Detection by sequencing of mutations in exons 2 (a). 8 (b), 13 (c) and 22 

(d). Sequences from a healthy control are shown above each mutant sequence. 

Asterisks indicate the position of the mutated nucleotides. The consequences on 

codon and amino acid residues are indicated on the left of the figure together 

with the name of the family. 

Figure 8 : Structure of nCL1 gene 

Figure 8A represents the 5' part of the gene with exon 1. 

Figure 8B represents the part of the gene including exons 2 to 8, 

Figure 8C represents the part of the gene including exon 9, 

Figure 8D represents the part of the gene including exons 10 to 24 

including the 3' non transcribed region. 
EXAMPLES 
EXAMPLE 1 

Localisation of the nCL1 within the LGMD2A interval 

Detailed genetic and physical maps of the LGMD2A region were 
constructed (Fougerousse et al., 1994), following the primary linkage assignment 
to 15q (Beckmann et al., 1991). The disease locus was bracketed between the 
D15S129 and D15S143 markers, defining the cytogenetic boundaries of the 
LGMD2A region as 15q15.1-15q21.1 (Fougerousse et al., 1994). Construction 
and analysis of a 10-12 Mb YAC contig (Fougerousse et al., 1994) permitted us 
to map 33 polymorphic markers within this interval and to further narrow the 
LGMD2A region to between D15S514 and D15S222. 
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The nCL1 gene had been localised to chromosome 15 by hybridisation with 
sorted chromosomes and by Southern hybridisation to DNA from human-mouse 
cell hybrids (Ohno et al., 1989).cDNA capture using YACs from the LGMD2A 
interval allowed the identification of thirteen positional candidate genes. nCL1 

5 was one of the two transcripts identified that showed muscle-specific expression 
as evidenced by northen blot analysis.The localisation was further confirmed by 
STS (for Sequence Tagged Site) assays. Primers used for the localisation of the 
nCL1 gene are P94in2, P94in13 and pcr6a3, as shown in Figure 1 and their 
characteristics being defined in Table 1 . 

10 Table 1: PCR primers used for localisation of the nCL1 gene. 



Primer name 


Primer sequence (5 '-3') 


Position 


Annealing 


PCR product size on 






within the 


temp (°C) 










cDNA 




cDNA 


genomic DNA 


P94in2 


ATGGAGCCAACAGAACTGA 


341-360 


58 


108 


1758 




C 


428-448 










GTATGACTCGGAAAAGAAG 












GT 










P94inl3 


TAAGCAAAAGCAGTCCCCA 


1893-1912 


58 


64 


1043 




C 


1936-1956 










TTGCTGTTCCTCACTTTCCT 
G 










P94-6a3 


GTTTCATCTGCTGCTTCGTT 


2342-2361 


56 


130 


818 




CTGGTTCAGGCATACATGG 


2452-2471 








P94exlter 


T 

TTCTTTATGTGGACCCTGAG 


218-239 


55 


76 


76 




TT 


275-293 










AC G AACTGG ATGGGG A ACT 











These primers are designed from different parts of the published human 
cDNA sequence (Sorimachi et al., 1989), and were used for an STS content 

15 screening on DNA from three chromosome 15 somatic cell hybrids and YACs 
from the LGMD2A contig. The results positioned the gene in a region previously 
defined as 15q15.1-q21.1 and on 3 YACs (774G4, 926G10, 923G7) localised in 
this region. The relative positions of STSs along the LGMD2A contig allowed to 
localise the gene between D15S512 and D15S488, in a candidate region 

20 suggested by linkage disequilibrium studies. 

The same primers as above were used to screen a cosmid library from YAC 
774G4. A group of 5 cosmids was identified (Fig. 1). Experiments with another 
nCL1 primer pair (P94ex1ter; Table 1) established that these cosmids cover all 
nCL1 exons except number 1, and that a second group of 4 cosmids contain this 



exon (Fig. 1). A minimal set of three overlapping cosmids (2G8-2B11-1F11) 
covers the entire gene (Figure 1). DNA from these cosmids was used to 
construct an EcoR\ restriction map of this region (Figure 1B). 
EXAMPLE 2 

> Determination of the nCL1 gene sequence 

Most of the sequences were obtained through shotgun sequencing of partial 
digests of cosmid 1 F1 1 subcloned in M1 3 and bluescript vectors, and by walking 
with internal primers. The sequence assembly was made using the XBAP 
software of the Staden package (Staden) and was in agreement with the 
restnct.on map of the cosmids. Sequences of exon 1 and adjacent regions were 
obtained by sequencing cosmid DNA or PCR products from human genomic 
DNA. The first intron is still not fully sequenced, but there is evidence that it may 
be between 10 to 16 kb in length (based on hybridisation of restriction fragments- 
data not shown). The entire gene, including its 5" and 3" regions, is more than 40 
kb long, and shown in Figure 8. 
a) thecDNA sequence 

The used technology allows the implementation of the published human 
cDNA sequence of nCL1 (Sorimachi 1989). It contains the missing 129 bases 
corresponding to the N-terminal 43 amino acids (Figure 2). It also differs from it 
at 12 positions. Three of which occur at third base positions of codons and 
preserve the encoded amino acid sequence. The other 9 differences lead to 
changes in amino-acid composition (Figure 2). As these different exons were 
sequenced repeatedly on at least 10 distinct genomes, we are confident that the 
sequence of Fig. 2 represents an authentic sequence and does not contain 
m.nor polymorphic variants. Furthermore, these modifications increase the local 
s.mi.arity with the rat nCL1 amino acid sequence (Sorimachi), although the 
overall similarity is still 94 %. 

The ATG numbered 1 in Figure 2 is the translation initiation site based on 
homology with the rat nCL1, and is within a sequence with only 5 nucleotides out 
of 8 ,n common with the Kosak consensus sequence (Kosak M, 1984). Putative 
CCAAT and TATA boxes were observed 590, 324, (CCAAT) and 544 or 33 bp 
(TATA) upstream of the initiating ATG codon, respectively (Bucher 1990) A GC- 
box binding the Sp1 protein (Dynan et al.. 1983) was identified at position -477 
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Consensus sequences corresponding to potential muscle-specific regulatory 
elements were identified (Fig. 2). These include a myocyte-specific enhancer- 
binding factor 2 (MEF2) binding site (Cserjesi P. 1991), a CArG box (Minty A. 
1986) and 6 E-boxes (binding sites for basic Helix-Loop-Helix proteins frequently 

5 found in members of MyoD family; Blackwell et Weintraub, 1990). The functional 
significance of these putative transcription factor binding sites in the regulation 
of nCL1 gene expression remains to be established. 

Two potential AAUAAA polyadenylation signals, were identified 520 and 777 
bp downstream of the TGA stop codon. The sequencing of a partial nCL1 cDNA 

10 containing a polyA tail, demonstrated that the first AAUAAA is the 
polyadenylation signal. The latter is embedded in a region well conserved with 
the rat nCL1 sequence and is followed after 4 bp by a G/T cluster, present in 
most genes 3* of the polyadenylation site (Birnstiel et al., 1985). The 3*- 
untranslated region of the nCL1 mRNA is 565 bp long. The predicted length of 

15 the cDNA should therefore be approximately 3550 or 3000 bp. 

b) Comparison with calpa'fn 

The sequence of the human nCL1 gene was compared to those of other 
calpains thereof (Figure 3). The most telling comparisons are with the 
homologous rat (Accession no J05121), bovine (Accession no U07858) and 

20 porcine (Accession no U05678) sequences. The accession numbers refers to 
those or international genebanks, such as GeneBank (N.LH.) or EMBL Database 
(EMBL, Heidelberg). High local similarities between the human and rat DNA 
sequences are even observed in the 5' (75%) or in different parts of the 3' 
untranslated regions (over 60%) (data not shown). The high extent of sequence 

25 homology manifested by the human and rat nCL1 gene in their untranslated 
regions is suggestive of evolutionary pressures on common putative regulatory 
sequences. 

c) Genomic organisation of the nCL1 gene 

A comparison of the published nCL1 human cDNA (Sorimachi et aL. 1989) 
30 with the corresponding genomic sequence led to the identification of 24 exons 
ranging in length from 12 bp (exon 13) to 309 bp (exon 1), with a mean size of 
100 bp (Figure 1). The size of introns ranges from 86 bp to about 10-16 kb for 
intron 1 . 
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The intron-exon boundaries as shown in Table 2 exhibit close adherence to 
5' and 3' splice site consensus sequences (Shapiro and Senapathy, 1987). 
Table 2 Sequences at the intron-exon junctions. A score expressing adherence 
to the consensus was calculated for each site according to Shapiro and 
Senapathy (1987). Sequences of exons and introns are in upper and lower 
cases, respectively. Size of exons are given in parenthesis. 



splice donor site 


score 
(%) 


Imron 


score 
(%) 


splice acceptor site 


Exon 














Exon 1 


(309bp)-> 


...CTCCGgtgagt... 


88.5 


<-Intron l-> 


99.0 




Exon 2 


(70 bp) -> 


. . - GCTAGgtagga. . 


. 83.5 


<-Intron 2-> 


90.0 




Exon 3 


(119bp)-> 


...TCCAGgtgagg.. 


92 


<-Intron 3-> 


81.5 




Exon 4 


(134 bp) -> 


. . . GCTAAgtaagc. . . 


82 


<-Intron 4-> 


81.5 




Exon 5 


(169 bp) -> 


...TTGATgtaagt... 


87 


<-Intron 5-> 


79.5 


"♦W^llCgggCClCagVJAl \J\J... 


Exon 6 


(144 bp) -> 


...CCCGGgtgtgL.. 


77.5 


<-Intron 6-> 


91 


ttaf*t OCtftZke*Zk a A C* A AT* 

...iuH v igwu#Licag/\i_./\/\ i ... 


Exon 7 


(84 bp) -> 


...ATGAGgtaagc... 


94 


<-Intron 7-> 


78.5 


tct Pi ot art tan oCWf^T* C* 


Exon 8 


(86 bp) -> 


...GATAGgtaggt... 


89 


<-Intron 8-> 


91.5 


CPttf tcvz*stf*m a A TCICI A 


Exon 9 


(78 bp) -> 


- TTCTGgtgagt... 


88 


<-Intron 9-> 


92 


• - • nwv»aov<titiUIKVJrt 1 VJ i . . . 


Exon 10(161 bp) -> 


•CCCAGgtggga... 


80 


<-Intron 10-> 


68.5 




Exon 11 (170 bp) -> 


..ACGAGgtgtgt.. 


85.5 


<-Intron ll-> 


86 




Exon 12 


(12bp)-> 


-AAGAGgtatag... 


70 


<-Intron 12-> 


87 


. . . iccccaicictcagA luLA... 


Exon 13 (209 bp) -> 


TCTGAgtgagt... 


76.5 


<-Intron 13-> 


97 


. . . tgtattcctcacagGG A AG. . . 


Exon 14 


(37 bp) -> 


..CAGTGgtgagt... 


89 


<-Intron 14-> 


93.5 


. . . cttttcttatgcag AAAAA. . . 


Exon 15 


(18 bp) -> 


..CCAAGgtaggt... 


89 


<-Intron 15-> 


87 


. . . cctcctctctccagCCC AT. . . 


Exonl6 (114bp)-> 


.CACAGgtgtct.. 


80 


<-Intron 16-> 


88 


. . . ttgtgcctccacagCC AC A. . . 


Exon 17 


(78 bp) -> 


.GAGATgtgagt... 


84 


<-Intron 17-> 


92.5 


. . . cccttcctcctcagG AC AT. . . 


Exon 18 


(58 bp) -> 


.CAAACgtgagt... 


83 


<-Intron 18-> 


90 


. xtccatccccccagAC AAG. . . 


Exon 19 


(65 bp) -> 


.TGGATgtatcc... 


56 


<-Intron 19-> 


88 


..cctccctcctccagACAGA. „ 


Exon 20 


(69 bp) -> 


GGCAGgtggga... 


80 


<-Intron 20-> 


94 


. . ttttcta ttgccagAAAT A. . . 


Exon 21 


(79 bp) -> 


CGCAGgtgctg... 


66 


<-Intron 21-> 


91 


. . ggtcccctccacagG ATTC. . . 


Exon 22 (117 bp) -> 
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...GTTCAgtaagt... 79 



<-Inuon 22-> 



93.5 ... gcattctttcacagG AGCT. . . 



Exon23 (59bp)-> 



...TGGAGgtaaag... 81 



<-Intron 23-> 



79 



. . . gggacttctttcagTGGCT. . . 



Exon 24 (27 bp) ♦> 



When the genomic sequence was submitted to GRAIL analysis (Uberbacher 
et al., 1991), 11 exons were correctly recognised, 4 were not identified, 6 were 
inadequately defined and 2 were too small to be recognised (data not shown). 



(amino acid residues 1 to 61), IS1 (residues 267 to 329) and IS2 (residues 578 
to 653). It is interesting to note that each of these sequences, as well as the 
nuclear translocation signal inside IS2, are essentially flanked by introns (Fig. 4). 
The exon-intron organisation of the human nCL1 is similar to that reported for 

10 the chicken CANP (the only other large subunit calpain gene whose genomic 
structure is known; (Emori et ah, 1986). 

Four microsatellite sequences were identified. Two of them are in the distal 
part of the first intron: an (AT)14 and an previously identified mixed-pattern 
microsatellite, S774G4B8, which was demonstrated to be non polymorphic 

15 (Fougerousse et al., 1994). A (TA)7(CA)4(GA)13 was identified in the second 
intron and genotyping of 64 CEPH unrelated individuals revealed two alleles 
(with frequencies of 0.10 and 0.90). The fourth microsatellite is a mixed 
(CA)n(TA) m repeat present in the 9th intron. The latter and the (AT)14 repeat 
have not been investigated for polymorphism. Fourteen repetitive sequences of 

20 the Alu family and one Mer2 repeat were identified in the nCL1 gene (Fig. 1C), 
which has, thus, on the average one Alu element per 2.5 kb. 

Southern blot experiments (Ohno et al., 1989) and STS screening (data not 
shown) suggest that there is but one copy per genome of this member of the 
calpam family. 

25 EXAMPLE 3 



The pattern of tissue-specificity was investigated by northern blot 
hybridisation with a genomic subclone probe from cosmid 1F11 spanning exons 
20 and 21 There is no evidence for the existence of an alternatively spliced form 
30 of nCL1, although this cannot be excluded. A transcript of about 3.4-3.6 kb was 
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As already noted, the nCL1 gene has three unique sequence blocks, NS 



Expression of the nCL1 gene 
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detected in skeletal muscle mRNA (Figure 5). This size therefore favours that the 
position -544 is the functional TATA box. 

Transcription studies suggested that it is an active gene rather than a 
pseudogene and its muscle-specific pattern of expression is consistent with the 
phenotype of this disorder (Sorimachi et al., 1989 and Figure 5). 

EXAMPLE 4 

Mutation screening 

nCL1 fulfils both positional and functional criteria to be a candidate gene for 
LGMD2A. To evaluate its role in the etiology of this disorder, nCL1 was 
systematically screened in 38 LGMD2 families for the presence of nucleotide 
changes using a combination of heteroduplex (Keen et al., 1991) and direct 
sequence analyses. 

PCR primers were designed to specifically amplify the exons and splice 
junctions and also the regions containing the putative CAT, TATA boxes and the 
polyadenylation signal of the gene as shown in Table 3. 
Table 3: PCR Primers used for the analysis of the nCL1 gene in LGMD patients. 



amplified region 


Primer sequences (5*-3') 


Size (bp) 


Annealing 
temp. (°C) 


promoter 


TTCAGTACCTCCCGTTCACC 


296 


59 




GATGCTTGAGCCAGGAAAAC 




exon 1 


CTTTCCTTGAAGGTAGCTGTAT 


438 


60 




GAGGTGCTGAGTGAGAGGAC 




exon 2 


ACTCCGTCTCAAAAAAATACCT 


239 


57 




ATTGTCCCTTTACCTCCTGG 




exon 3 


TGGAAGTAGGAGAGTGGGCA 


354 


58 




GGGTAGATGGGTGGGAAGTT 


exon 4 


GAGGAATGTGGAGGAAGGAC 


292 


59 




TTCCTGTGAGTGAGGTCTCG 


exon 5 


GGAACTCTGTGACCCCAAAT 


325 


56 




TCCTCAAACAAAACATTCGC 


exon 6 


GTTCCCTACATTCTCCATCG 


315 


57 


exon 7 


GTTATTTCAACCCAGACCCTT 


AATGGGTTCTCTGGTTACTGC 


333 


56 


exon 8 


AGCACGAAAAGCAAAGATAAA 


GTAAGAGATTTGCCCCCCAG 


321 


58 




TCTGCGGATCATTGGTTTTG 


exon 9 


CCTTCCCTTCTTCCTGCTTC 


173 


56 




CTCTCTTCCCCACCCTTACC 


exon 10 


CCTCCTCACCTGCTCCCATA 


251 


56 




TTTTTCGGCTTAGACCCTCC 


exon 1 1 


TGTGGGGAATAGAAATAAATGG 


355 


57 




CCAGGAGCTCTGTGGGTCA 


exon 12 


GGCTCCTCATCCTCATTCACA 


312 


61 




GTGGAGGAGGGTGAGTGTGC 


exon 13 


TGTGGCAGGACAGGACGTTC 


337 " 


60 




14 





TTCAACCTCTGGAGTGGGCC 






exon 14 


CACCAGAGCAAACCGTCCAC 

ACAGCCCAGACTCCCATTCC 


230 


61 


exon 15 


TTCTCTTCTCCCTTCACCCT 
ACACACTTCATGCTCTCTACCC 


225 


57 


exon 16 


CCGCCTATTCCTTTCCTCTT 
GACAAACTCCTGGGAAGCCT 


331 


56 


exon 17 


ACCTCTGACCCCTGTGAACC 
TGTGGATTTGTGTGCTACGC 


270 


61 


exon 18 


CATAAATAGCACCGACAGGGA 
GGGATGGAGAAGAGTGAGGA 


258 


59 


exon 19 


TCCTCACTCTTCTCCATCCC 
ACCCTGTATGTTGCCTTGG 


159 


57 


exons 20-21 


GGGGATTTTGCTGTGTGCTG 
ATTCCTGCTCCCACCGTCTC 


333 


61 


exon 22 


CACAGAGTGTCCGAGAGGCA 
GG AG ATT AT CAGGTG AG ATGCC 


282 


57 


exons 22-23 


CAGAGTGTCCGAGAGGCAGGG 
CGTTGACCCCTCCACCTTGA 


608 


61 


exon 24 


GGGAAAACATGCACCTTCTT 
TAGGGGGTAAAATGGAGGAG 


375 * 


58 


polyadenylation signal 


ACTAACTCAGTGGAATAGGG 
GGAGCTAGGATAGCTCAAT 


413 


56 



PGR products made on DNA from blood of specific LGMD2A patients were 
then subjected either to heteroduplex analysis or to direct sequencing, 

5 depending on whether the mutation, based on haplotype analysis, was expected 
to be homozygous or heterozygous, respectively. It was occasionally necessary 
to clone the PCR products to precisely identify the mutations (i.e., for 
microdeletions or insertions and for some heterozygotes). Disease-associated 
mutations are summarised in Table 4 hereunder and their position along the 

10 protein is shown in Fig. 4. 

Table 4: nCL1 mutations in LGMD2A families. 

Codons and amino acid positions are numbered on the basis of the cDNA 
sequence starting from ATG. 



Exon 


Families 


Nucleotide 
position 


Nucleotide change 


Amino acid 
position 


Amino acid 
change 


Restriction si 


2 


B519* 


328 


CGA->TGA 


no 


Arg->stop 




4 


M42 


545 


CTG -> C AG 


182 


Leu->Gln 




4 


M1394; M2888 


550 


CAA -> CA 


184 


frameshift 




5 


M35; M37 


701 


GGG -> GAG 


234 


GIy->GIu 





15 



6 


M32 


945 


CGG -> CG 


315 


frairtpchi'ft 

iiciiTicsniii 


-bmal 


8 


M2407* 


1061 


GTG -> ooo 


J 34 


Val-> Gly 




8 


M1394 


1079 


TGG -> TAG 


JOU 


Trp->stop 


-Bstnl, 


11 


M2888 


1468 


CGG -> TOO 


4 OA 


Arg->Trp 




13 


R12* 


1715 


poo .> rAn 


572 


Arg->Gln 


-Mspl 


IQ 

i y 


K-i / 


2069-2070 


deletion AC 


690 


frameshift 




21 


R14; R17 


2230 


AGC -> GGC 


744 


Ser->GIy 


-Alul 


22 


A*; B501*; 


M32 2306 


CGG -> CAG 


769 


Arg->Gln 




22 


B505 


2313-2316 


deletion AGAC 


771-772 


frameshift 




22 


R14; B505 


2362-2363 


AG -> TCATCT 


788 


frameshift 





The first letter of the family code refers to the origin of the population B= Brazil, 
M= metropolitan France, R = Isle of La Reunion, A= Amish. 

Each mutation was confirmed by heteroduplex analysis, by sequencing of 
both strands in several members of the family or by enzymatic digestion when 
the mutation resulted in the modification of a restriction site. Segregation 
analyses of the mutations, performed on DNAs from all available members of the 
families, confirmed that these sequence variations are on the parental 
chromosome carrying the LGMD2A mutation. To exclude the possibility that the 
missense substitutions might be polymorphisms, their presence was 
systematically tested in a control population: none of these mutations was seen 
among 120 control chromosomes from the CEPH reference families. 

EXAMPLE 5 • 

Analysis of families genes, chromosome-1 5 ascertained families 
The initial screening for causative mutations was performed on families, 
each containing a LGMD gene located on chromosome 15. These included 
families from the Island of La Reunion (Beckmann et al., 1991), from the Old 
Order Amish from northern Indiana (Young et al., 1992.) and 2 Brazilian families 
(Passos Bueno et al., 1993). 
a) Reunion Island families 

Genealogical studies and geographic isolation of the families from the Isle 
of La Reunion were suggestive of a single founder effect. Genetic analyses are, 
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however, inconsistent with this hypothesis as the families present haplotype 
heterogeneity. At least, six different carrier chromosomes are encountered, (with 
affected individuals in several families being compound heterozygotes). Distinct 
mutations corresponding to four of these six haplotypes have been identified 
5 thus far. 

In family R14, exons 13, 21 and 22 showed evidence for sequence variation 
upon heteroduplex analysis (Fig. 6). Sequencing of the associated PCR products 
revealed (i) a polymorphism in exon 13, (ii) a missense mutation (A->G) in exon 
21 transforming the Ser 744 residue to a glycine in the loop of the second EF- 

10 hand in domain IV of the protein (Figure 4), and (iii) a frameshift mutation in exon 
22. The exon 21 mutation and the polymorphism in exon 13 form an haplotype 
which is also encountered in family R17. Subcloning of the PCR products was 
necessary to identify the exon 22 mutation. Sequencing of several clones 
revealed a replacement of AG by TCATCT (data not shown). This frameshift 

is mutation causes premature termination at nucleotide 2400 where an in frame 
stop codon occurs (Figure. 4). 

The affected individuals in family R12 are homozygous for all markers of the 
LGMD2A interval (Ailamand, submitted). Sequencing of the PCR products of 
exon 13 revealed a G to A transition at base 1715 of the cDNA resulting in a 

20 substitution of glutamine for Arg 572 (Figure. 7) within domain III, a residue which 
is highly conserved throughout all known calpains. This mutation, detectable by 
loss of Msp\ restriction site, is present only in this family and in no other 
examined LGMD2A families or unrelated controls. 

In family R27, heteroduplex analysis followed by sequencing of the PCR 

25 products of an affected child revealed a two base pair deletion in exon 19 
(Figure. 6 and table 4). One AC out of three is missing at this position of the 
sequence, producing a stop codon at position 2069 of the cDNA sequence 
(Figure 4). 

b) Amish families 

30 As expected, due to multiple consanguineous links, the examined LGMD2A 

Northern Indiana Amish patients were homozygous for the haplotype on the 
chromosome bearing the mutant allele (Ailamand, submitted). A (G->A) 
missense mutation was identified at nucleotide 2306 within exon 22 (Fig. 7). The 



17 



resulting codon change is CGG to CAG, transforming Arg?69 to glutamine. This 
residue, which is conserved throughout ail members of the calpain family in ail 
species, is located in domain IV of the protein within the 3rd EF-hand at the 
helix-loop junction (ref). This mutation was encountered in a homozygous state 
5 in all patients from 12 chromosome 15-linked Amish families, in agreement with 
the haplotype analysis. We also screened six Southern Indiana Amish LGMD 
families, for which the chromosome 15 locus was excluded by linkage analyses 
(Allamand ESHG, submitted, ASHG 94). As expected, this nucleotide change 
was not present in any of the patients from these families, thus confirming the 
10 genetic heterogeneity of this disease in this genetically related isolate, 
c) Brazilian families 

As a result of consanguineous marriages, two Brazilian families (B501, 
B519) are homozygous for extended LGMD2A carrier haplotypes (data not 
shown). Sequencing PCR products from affected individuals of these families 
15 demonstrated that family B501 has the same exon 22 mutation found in northern 
Indiana Amish patients (Figure 7), but embedded in a completely different 
haplotype. In family B519, the patients carry a C to T transition in exon 2, 
replacing Arg328 witn a TGA stop codon (Fjgure 7) thus | eadingi presumably, to 
a very truncated protein (Figure 4). 
20 d) Analysis of other LGMD families 

Having validated the role of the candidate gene in the chromosome 15 
ascertained families, we next examined by heteroduplex analysis LGMD families 
for which linkage data were not informative. These included one Brazilian (B505) 
and 13 metropolitan French pedigrees. 
25 Heteroduplex bands were revealed for exons 1 , 3, 4, 5, 6, 8, 1 1 , 22 of one 

or more patients (Figure 6). Of all sequence variants, 10 were identified as 
possible pathogenic mutations (5 missense, 1 nonsense and 4 frameshift 
mutations) and 3 as polymorphisms with no change of amino acid of the protein. 
All causative mutations identified are listed in Table 4 here-above. Identical 
io mutations were uncovered in apparently unrelated families. The mutations 
shared by families M35 and M37, and M2888 and M1394, respectively, are likely 
to be the consequence of independent events since they are embedded in 
different marker haplotypes. In contrast, it is likely that the point mutation in exon 
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22 of the Amish and in the M32 kindreds corresponds to the same mutational 
event as both chromosomes share a common four marker haplotype (774G4A1- 
774G4A10-774G454D-774G4A2) around nCL1 (data not shown), possibly 
reflecting a common ancestor. The same holds true for the AG to TCATCT 

5 substitution mutation encountered in exon 22 in families B505 and R14. The 
exon 8 (T->G) transversion is present in the two carrier chromosomes of M2407, 
the only metropolitan family homozygous by haplotype, possibly reflecting an 
undocumented consanguinity. For some families, no disease-causing mutation 
has been detected thus far (M40 for example). 

10 In addition to the polymorphism present in exon 13 in families R14 and R17 

(position 668) and in the intragenic microsatellites, four additional neutral 
variations were detected: a (T->C) transition at position 96, abolishing a Ddel 
restriction site in exon 1 in M31; a (C->T) transition in exon 3 (position 495) in 
M40 and in M37 forming a haplotype with the exon 5 mutation (in the former 

15 family, this polymorphism does not cosegregate with the disease); a (T->C) 
transition in the paternally derived promoter in M42 at position -428, which was 
also evidenced in healthy controls; and a variable poly(G) in intron 22 close to 
the splice site in families R20, R11, R19, M35 and M37. The latter is also 
present in the members of the CEPH families, but is not useful as a genetic 

20 marker as the visualisation and interpretation of mononucleotide repeat alleles is 
difficult. 

In total, sixteen independent mutational events representing fourteen 
different mutations were identified. All mutations cosegregate with the disease in 
LGMD2A families. The characterised morbid calpain alleles contain nucleotide 

25 changes which were not found in alleles from normal individual. The discovery of 
two nonsense and five frameshift mutations in nCL1 supports the hypothesis that 
a deficiency of this product causes LGMD2A. All seven mutations result in a 
premature in-frame stop codon, leading to the production of truncated and 
presumably inactive proteins (Figure 4). Evidences for the morbidity of the 

30 missense mutations come from (1) the relative high incidence of such mutations 
among LGMD2A patients ; although it is difficult in the absence of functional 
assays to differentiate between a polymorphism and a morbid mutation, the 
occurrence of different "missense" mutations in this gene cannot all be 
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accounted for as rare private polymorphisms; (2) the failure to observe these 
mutations in control chromosomes; and (3) the occurrence of mutations in 
evolutionarily conserved residues and/or in regions of documented functional 
importance. Four of seven missense mutations change an amino acid which is 
conserved in all known members of the calpain family in all species (Figure 3). 
Two of the remaining mutations affect less conserved amino acid residues, but 
are located in important functional domains. The substitution V354G in exon 8 is 
4 residues before the asparagine at the active site and S744G in exon 21 is 
within the loop of the second EF-hand and may impair the calcium-dependent 
regulation of calpain activity or the interaction with a small subunit (Figure 4). 
Several missense mutations change a hydrophobic residue to a polar one, or 
vice versa (Table 4) possibly disrupting higher order structures. 
METHODS 

Description of the patients 

The LGMD2A families analysed were from 4 different geographic origins. 
They included 3 Brazilian families, 13 interrelated nuclear families from the Isle 
of la Reunion, 10 French metropolitan families and 12 US Amish families. The 
majority of these families were previously ascertained to belong to the 
chromosome 15 group by linkage analysis (Beckmann, 1991; Young, Passos- 
Bueno et al., 1993). However, some families from metropolitan France as well as 
one Brazilian family, B505, had non significant lodscores for chromosome 15. 
Genomic DNA was obtained from peripheral blood lymphocytes. 

Sequencing of cosmid C774G4-1F1 1 and EcoRI restriction map nf rnsmHc 
Cosmid 1F1 1 (Figure 1C) was subcloned following DNA preparation through 
Qiagen procedure (Qiagen Inc., USA) and partial digestion with either Sau3A, 
Rsal or Alul. Size-selected restriction fragments were recovered fom low-melting 
agarose and eventually ligated with M13 or Bluescript (Stratagene, USA) 
vectors. After electroporation in E.coli, recombinant colonies were picked in 100 
Ml of LB/ampiciliin media. PCR reactions were performed on 1 pi of the culture in 
10 mM Tris-HCI, pH 9.0, 50 mM KCI, 1.5 mM MgCI2, 0.1% Triton X-100, 0.01 
gelatine, 200uM of each dNTP, 1 U of Taq Polymerase (Amersham) with 100 ng 
of each vectors primers. Amplification was initiated by 5 min denaturation at 
95X, followed by 30 cycles of 40 sec denaturation at 92X and 30 sec annealing 
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at 50°C. PCR products were purified through Microcon devices (Amicon, USA) 
and sequenced using the dideoxy chain termination method on an ABI 
sequencer (Applied Biosystems, Foster City, USA). The sequences were 
analysed and alignments performed using the XBAP software of the Staden 
5 package, version 93.9 (Staden, 1982). Gaps between sequence contigs were 
filled by walking with internal primers. EcoRl restriction map of cosmids was 
performed essentially as described in Sambrook et al. (1989). 
Northern Blot analysis 

The probes were labelled by random priming with dCTP-(a 32 P). 
10 Hybridisation was performed to human multiple tissue northern blots as 
recommended by the manufacturer (Clontech, USA). 
Analysis of PCR products from LGMD2A families 

One hundred ng of human DNA were used per PCR under the buffer and 
cycle conditions described in Fougerousse (1994) (annealing temperature shown 

is in Table 3). Heteroduplex analysis (Keene et al., 1991) was performed by 
electrophoresis of ten \j\ of PCR products on a 1.5 mm-thick Hydrolink MDE gels 
(Bioprobe) at 500-600 volt for 12-15 h depending of the fragment length. 
Migration profile was visualised under UV after ethidium bromide staining. 

For sequence analysis, the PCR products were subjected to dye-dideoxy 

20 sequencing, after purification through microcon devices (Amicon, USA). When 
necessary, depending on the nature of the mutations (e.g., frameshift mutation or 
for some heterozygotes), the PCR products were cloned using the TA cloning kit 
from Invitrogen (UK). One \j\ of product was ligated to 25 ng of vector at 12°C 
overnight. After electroporation into XL1-blue bacteria, several independent 

25 clones were analysed by PCR and sequenced as described above. 

The invention results from the finding that the nCL1 gene when it is mutated 
is involved in the etiology of LGMD2A. It is exactly the contrary to what is stated 
in the litterature, e.g. that the disease is accompanied by the presence of a 
deregulated calpaTn. Identification of nCL1 as the defective gene in LGMD2A 

30 represents the first example of muscular dystrophy caused by mutation affecting 
a gene which is not a structural component of muscle tissue, in contrast with 
previously identified muscular dystrophies such as Duchenne and Becker 
(Bonilla et al., 1988), severe childhood autosomal recessive (Matsumara et al., 
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1992), Fukuyama (Matsumara et al., 1993) and merosin-deficient congenital 
muscular dystrophies (Tome et al., 1 994). 

The understanding of the LGMD2A phenotype needs to take into account 
the fact that there is no active nCL1 protein in several patients, a loss compatible 
with the recessive manifestation of this disease. Simple models in which this 
protease would be involved in the degradation or destabilisation of structural 
components of the cytoskeleton, extracellular matrix or dystrophin complex can 
therefore be ruled out. Furthermore, there are no signs of such alterations by 
immunocytogenetic studies on LGMD2 muscle biopsies (Matsumara et al., 1993; 
Tome et al., 1994). Likewise, since LGMD2A myofibers are apparently not 
different from other dystrophic ones, it seems unlikely that this calpain plays a 
role in myoblast fusion, as proposed for ubiquitous calpains (Wang et al., 1989). 

All the data disclosed in these examples confirm that the nCL1 gene is a 
major gene involved in the disease when mutated. 

The fact that morbidity results from the loss of an enzymatic activity raises 
hopes for novel pharmaco-therapeutic prospects. The availability of transgenic 
models will be an invaluable tool for these investigations. 

The invention is also relative to the use of a nucleic acid or a sequence of 
nucleic acid of the invention, or to the use of a protein coded by the nucleic acid 
for the manufacturing of a drug in the prevention or treatment of LGMD2. 

The finding that a defective calpain underlies the pathogenesis of LGMD2A 
may prove useful for the identification of the other loci involved in the LGMDs. 
Other forms of LGMD may indeed be caused by mutations in genes whose 
products are the CANP substrates or in genes involved in the regulation of nCL1 
expression. Techniques such as the two-hybrid selection system (Fields et al., 
1989) could lend themselves to the isolation of the natural protein substrate(s) of 
this calpain, and thus potentially help to identify other LGMD loci. 

The invention also relates to the use of all or a part of the peptidic sequence 
of the enzyme, or of the enzyme, product of nCL1 gene, for the screening of the 
ligands of this enzyme, which might be also involved in the etiology and the 
morbidity of LGMD2 

The ligands which might be involved are for example substrate(s). activators 
or inhibitors of the enzyme. 
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The nucleic acids of the invention might also be used in a screening method 
for the determination of the components which may act on the regulation of the 
gene expression. 

A process of screening using either the enzyme or a host recombinant cell, 
5 containing the nCL1 gene and expressing the enzyme, is also a part of the 
invention. 

The pharmacological methods, and the use of nucleic acid and peptidic 
sequences of the invention are very potent applications. 

The methods used for such screenings of ligands or regulatory elements are 
10 those described for example for the screening of ligands using cloned receptors. 

The identification of mutations in the nCL1 gene provides the means for 
direct prenatal or presymptomatic diagnosis and carrier detection in families in 
which both mutations have been identified. Gene-based accurate classification 
of LGMD2A families should prove useful for the differential diagnosis of this 
15 disorder. 

The invention relates to a method of detection of a predisposition to LGMD2 
in a family or a human being, such method comprising the steps of : 

- selecting one or more exons or flanking sequences which are sensitive in 
said family; 

20 - selecting the primers specific for the or these exons or their flanking 

sequences, a specific example being the PCR primers of Table 3, or an hybrid 
thereof, 

- amplifying the nucleic acid sequence, the substrate for this amplification 
being the DNA of the human being to be checked for the predisposition, and 

25 - comparing the amplified sequence to the corresponding sequence derived 

from Figure 2 or Figure 8. 

Table 2 indicates the sequences of the introns-exons junctions, and primers 
comprising in their structure these junctions are also included in the invention. 
All other primers suitable for such RNA or DNA amplification may be used in 
30 the method of the invention. 

In the same way, any suitable amplification method : PCR (for Polymerase 
Chain Reaction ®) NASBA ® (for Nucleic acid Sequence Based Amplification), 
or others might be used. 
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The methods usually used in the detection of one site mutations, like ASO 
(Allele specific PCR), LCR, or ARMS (Amplification Refactory Mutation System) 
may be implemented with the specific primers of the invention. 

The primers, such as described in Tables 1 and 3, or including junctions of 
Table 2, or more generally including the flanking sequences of one of the 24 
exons are also a part of the invention. 

The kit for the detection of a predisposition to LGMD2 by nucleic acid 
amplification is also in the scope of the invention, such a kit comprises a least 
PCR primers selected from the group of : 

a) in those described in table 1 

b) in those described in table 3 

c) those including the introns-exons junctions of Table 2. 

d) derived from primers defined in a),b) or c). 

The nucleic acid sequence of claim 1 to 3 might be inserted in a viral or a 
retroviral vector, said vector being able to transfect a packaging cell line. 

The packaging transfected cell line, might be used as a drug for gene 
therapy of LGMD2. 

The treatment of LGMD2 disease by gene therapy is implemented by a 
pharmaceutical composition containing a component selected from the group of : 

a) a nucleic acid sequence according to claims 1 to 4, 

b) a cell line according to claim 24, 

c) an aminoacid sequence according to claims 5 to 9. 
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CLAIMS 

1 . A nucleic acid sequence comprising : 

1 ) the sequence represented in Figure 8; or 

2) the sequence represented in Figure 2; or 

5 3) a part of the sequence of Figure 2 with the proviso that it is able to 

code for a protein having a calcium dependant protease activity involved in a 
LGMD2 disease ; or 

4) a sequence derived from a sequence defined in 1), 2) or 3) by 
substitution, deletion or addition of one or more nucleotides with the proviso that 

10 said sequence still codes for said protease. 

2. A nucleic acid sequence that is complementary to a nucleic acid 
sequence according to claim 1. 

3. A nucleic acid sequence comprising in its structure a nucleotidic 
sequence according to claim 1 or 2, under the control of regulatory elements, 

15 and involved in the expression of calpain activity in a LGMD2 disease. 

4. A nucleic acid sequence encoding the aminoacid sequence represented 
in Figure 2. 

5. An amino acid sequence which is coded by a nucleic acid sequence 
according to claims 1 to 4. 

20 6. An amino acid sequence according to claim 5, characterized in that it is a 

calcium dependent protease enzyme belonging to the calpain family, involved in 
the etiology of LGMD2. 

7. An aminoacid sequence according to claim 5 or 6 such as represented in 
Figure 2. 

25 8. An amino acid sequence according to claims 5 to 7, characterized in that 

the amino acid sequence is modified by deletion, insertion and/or replacement of 
one or more amino acids with the proviso that such aminoacid sequence has the 
calpain activity involved in LGMD2 disease. 

9. An amino acid sequence according to claim 6 to 8, characterized in that 
30 LGMD2 is LGMD2A. 

10. A host cell unable to express a calpain enzyme activity, characterized in 
that it is transformed or transfected with a nucleic acid sequence comprising all 
or part of the nucleic acid sequence according to any one of claims 1 to 4. 
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1 1 . Use of a nucleic acid according to one of claims 1 to 4 or a host cell 
according to claim 10 in the manufacturing of a drug for the prevention or the 
treatment of an LGMD2 disease. 

12. Use of an amino acid sequence according to claims 5 to 8 in the 
manufacturing of a drug for the prevention or the treatment of an LGMD2 
disease. 

13. Use according to claims 10 or 11, characterized in that LGMD2 is 
LGMD2A. 

14. Use of an amino acid sequence according to claims 5 to 9 for the 
screening of the ligands of said amino acid sequence, said ligand being selected 
in a group consisting of substrate(s), co-factors or regulatory components. 

15. Use of a nucleic acid sequence according to one of claims 1 to 4 in a 
screening method for the determination of the components which may act on the 
regulation of gene expression of calpain. 

16. Use of an host cell according to claim 10 in a screening method for the 
determination of components active on the expression of the calpain. 

17. A method for detecting of a predisposition to a LGMD2 disease in a 
family or a human being, such method comprising the steps of : 

- selecting one or more exons or their flanking sequences of the gene, 

- selecting primers specific for these exons, or their flanking sequences, or 
an hybrid thereof, 

- amplifying the nucleic acid sequences with these primers, the substrate for 
this amplification being the DNA of a human being; and 

- comparing the amplified sequence to the corresponding sequence derived 
from Figure 2 or Figure 8. 

18. The method according to claim 17, characterized in that the primers are 
those selected from the group of : 

a) those described in Table 1 ; 

b) those described in Table 3; and 

c) those including the introns-exons junctions of Table 2; 

d) those derived from the primers in a), b), or c). 

19. The method according to claim 17 or 18, characterized in that LGMD2 is 
LGMD2A. 
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20. A kit for the detection of a predisposition to LGMD2 by nucleic and 
amplification characterized in that it comprises primers selected from the group 
of: 

a) those described in Table 1 ; 
5 b) those described in Table 3; and 

c) those including the introns-exons junctions of Table 2; 

d) those derived from the primers in a), b) or c). 

21. Packaging cell lines tranfected by a recombinant vector, characterized in 
that the vector contains a nucleic acid sequence as claimed in claims 1 to 4. 

10 22. Use of a packaging cell line according to claim 21 as a drug for gene 

therapy of an LGMD2 disease. 

23. Pharmaceutical composition for the treatment of an LGMD2 disease 
characterized in that in contains a component selected from the group of : 



15 



a) a nucleic acid sequence according to claims 1 to 4, 

b) a cell line according to claim 24, 

c) an aminoacid sequence according to claims 5 to 9. 



ABSTRACT 
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A nucleic acid sequence comprising : 

1 ) the sequence represented in Figure 8; or 

2) the sequence represented in Figure 2; or 

3) a part of the sequence of Figure 2 with the proviso that it is able to 
code for a protein having a calcium dependant protease activity involved in a 
LGMD2 disease ; or 

4) a sequence derived from a sequence defined in 1), 2) or 3) by 
substitution, deletion or addition of one or more nucleotides with the proviso that 
said sequence still codes for said protease. 
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A) EXON 2 



B) £XQ_NJ£ 



AATCCCCG AT T TA 



Normal sequence 




AGCTG GTGCGGC T 



Normal sequence 




B519 



CGA -> TGA 
ArgllO Stop 



AATCCCTGATTTA 




M2407 



AG CTGGGGCGGC T 



GTG -> GGG 
Val354 Gly 




2) EXON 13 



TCCTCCGGGTCTT 



Normal sequence if; 




R12 



TCCTCCAGGTCTT 



CGG -> CAG •* I 
Arg572 Gin :\\ 



C C ATGCG G T ACGC 



Normal sequence 



Amish 



CGG -> CAG 
Arg769 Gin 



CCATGCAG TACGC 




B501 



CGG -> CAG 
Arg769 Gin 



CC ATGC AG T ACGC 
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LISTE DE SEQUENCES 



(1) INFORMATION GEN ERA LE : 

(i) DEPOSANT: 

(A) NOM: AFM 

(B) RUE: 13. , place de Rungis 

(C) VILLE: PARIS 

(E) PAYS: FRANCE 

(F) CODE POSTAL: 75013 

(G) TELEPHONE: (1) 45 65 13 00 

(ii) TITRE DE L' INVENTION: LGMD GENE 
(iii) NOMBRE DE SEQUENCES: 4 

(iv) FORME LISIBLE PAR ORDINATEUR: 

(A) TYPE DE SUPPORT: Floppy disk 

(B) ORDINATEUR: IBM PC compatible 

(C) SYSTEME D' EXPLOITATION: PC -DOS/MS -DOS 

(D) LOGICIEL: Patentln Release #1.0, Version #1.25 (OEB) 

(2) INFORMATION POUR LA SEQ ID NO: 1: 

(i) CARACTERI STIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 3018 paires de bases 

(B) TYPE: acide nucleique 

(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lineaire 

(ii) TYPE DE MOLECULE: ADN (genomique) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 1: 



TGATAGGTGC 


TTGTAAACTG 


TGCTTAACGA 


AAACATACCG 


TGTGCTGTAG 


GGACTTAACT 


60 


CTTGTTTATA 


TCAGTTAGCC 


TGGTTTCGCT 


AACAGTACAT 


CATTTTGCTT 


AAAGTCACAG 


120 


CTTACGAGAA 


CCTATCGATG 


ATGTTAAGTG 


AGGATTTTCT 


CTGCTCAGGT 


GCACTTTTTT 


180 


TTTTTTTTAA 


GACGGAGTCT 


CTTTCTGTCA 


CCTGGGCTGG 


AGTGCAGTGG 


CGTGATCTGG 


240 


GTTCACAACA 


ACCTCTGCCT 


CCTGGGTTCA 


AGCAATTCTT 


CTGTCTCAGC 


CTCCCAAGTA 


300 


GCTGGGATTA 


CAGGCACCCG 


CCGCCACACC 


CGGCTTATTT 


TTGTATTTTT 


AGTAGAGACA 


360 


GGGTTTCACT 


ATTGTTGACC 


ATGCTGGTCT 


CGAACTCGTG 


ACCTCATGTG 


ATCCACCCGC 


420 


CTCGGCCTCC 


CAAAGTGCAG 


AGATTAGAGA 


CGTGAGCCAC 


ATGGCCCAGC 


AGGACCACTT 


480 



FIG 8A/1 
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TTTAGCAGAT TCAGTCCCAG TGTTCATTTT GTGGATGGGG AGAGACAAGA GGTGCAAGGT 540 

CAAGTGTGCA GGTAGAGACA GGGATTTTCT CAAATGAGGA CTCTGCTGAG TAGCATTTTC 600 

CATGCAGACA TTTCCAATGA GCGCTGACCC AAGAACATTC TAAAAAGATA CCAAATCTAA 660 

CATTGAATAA TGTTCTGATA TCCTAAAATT TTAGGACTAA AAATCATGTT CTCTAAAATT 720 

CACAGAATAT TTTTGTAGAA TTCAGTACCT CCCGTTCACC CTAACTAGCT TTTTTGCAAT 780 

ATTGTTTTCC ATTCATTTGA TGGGCAGTAG TTGGGTGGTC TGTATAACTG CCTACTCAAT 840 

AACATGTCAG CAGTTCTCAG CTTCTTTCCA GTGTTCACCT TACTCAGATA CTCCCTTTTC 900 

ATTTTCTGTC AACACCAGCA CTTCATGTCA ACAGAAATGT CCCTAGCCAG GTTCTCTCTC 960 

TACCATGCAG TCTCTCTTGC TCTCATACTC ACAGTGTTTC TTCACATCTA TTTTTAGTTT 1020 

TCCTGGCTCA AGCATCTTCA GGCCACTGAA ACACAACCCT CACTCTCTTT CTCTCTCCCT 1080 

CTGGCATGCA TGCTGCTGGT AGGAGACCCC CAAGTCAACA TTGCTTCAGA AATCCTTTAG 1140 

CACTCATTTC TCAGGAGAAC TTATGGCTTC AGAATCACAG CTCGGTTTTT AAGATGGACA 1200 

TAACCTGTCC GACCTTCTGA TGGGCTTTCA ACTTTGAACT GGATGTGGAC ACTTTTCTCT 1260 

CAGATGACAG AATTACTCCA ACTTCCCCTT TGCAGTTGCT TCCTTTCCTT GAAGGTAGCT 1320 

GTATCTTATT TTCTTTAAAA AGCTTTTTCT TCCAAAGCCA CTTGCCATGC CGACCGTCAT 1380 

TAGCGCATCT GTGGCTCCAA GGACAGCGGC TGAGCCCCGG TCCCCAGGGC CAGTTCCTCA 1440 

CCCGGCCCAG AGCAAGGCCA CTGAGGCTGG GGGTGGAAAC CCAAGTGGCA TCTATTCAGC 1500 

CATCATCAGC CGCAATTTTC CTATTATCGG AGTGAAAGAG AAGACATTCG AGCAACTTCA 1560 

CAAGAAATGT CTAGAAAAGA AAGTTCTTTA TGTGGACCCT GAGTTCCCAC C GG ATGAGAC 1620 

CTCTCTCTTT TATAGCCAGA AGTTCCCCAT CCAGTTCGTC TGCAAGAGAC TCCGGTGAGT 1680 

AGCTTCCTGC TTGCTGGCTG GGTTTCCCCC CCACGGAGGA GTCCTCTCAC TCAGCACCTC 1740 

CGGCAGCTCA GCTGTGCACA TGGGCACTGG GGGAAGGATC CTGGCAGCAG CTCTGCTGGG 1800 

CTCTGTCTTT AAGTGTGAAG CAGGGAGGAG AGGAACAGGT CTCAGATATT TCACCAAATC 1860 

TCAGCAAAAT CCAGAGGGAG AGCGCAGGAG GTGGGGTGAT TCTTATGCTC TGGCTCTTTC 1920 

TCTCTGAAAA AAAAAAAAAA ATCTTGCTTT TTATAAAAGT GGGTGG AACT CAGTTTAATT 1980 

CATCCTGTAA AAATAAATAT TCCTTTCTCA GAACAAATTC CAGACAGCCC AGATGTACCT 2040 

GTTCGTTTTA ATATTATTCA TCTTGGTAAG ATTATTTCAG TTTCTCTGGC TAAAATCATG 2100 



FIG 8A/2 
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ATGTTATTCT TCTTTAATTT ACCAATGGCC ATTCTTTCTG AAACACAGAA ACCCTAGAAA 2160 
GAGAAGAGTC ATAGGCAAGG AATTTTTTTC ATGCATAAAA TGTTGGGGTT AAAGAGAGAG 2220 
AGACCTAGCA ATCGCTTTGG TCCACCTACC TCACCTCATA AGTGAGGAGT CAAGGCACAC 2280 
TAGAGTGAAA TATATCTAGT GGGCACATGA CAGAGCCCGG ATTAAAACTT TGTTTTAGGA 2340 
AACTCTCCCA GCCTCTGGGT TTCATTTACA GTGATCGCCA GGAGGGAAAT CACATTCCCC 2400 
TGGCTCACCT CTCTGATCAT CCCTCCAGTG TGACTCTTGT TCTTAATTCG AGAAATATTT 2460 
ATTGAGCATC TACTAGTGCC AGCACTGGGC AAGCAACTGG GGGGACAGCA GTGAGTAAGA 2520 
AAG AC CAAAA TTCCAGCTGT CTTGGAACCT AGGGTCCTGA AGGGAAGATG GGCATTGAAC 2580 

AAGAGTGACA TTGTCAGGAG ACGATGTTCT GGGTGCCACA GGATCATGTG GCAAGGAGAG 2640 

CTAACCTGGT CCAGGGAGAC AAACCCTCTC TGAGGAAATG ATGACAAGCT GAG AC C CAAT 2700 

ACTATTGATT AGCCATGGTT TTCTTTAACC TAAGGTGGGC CAGGCATGGT GGCTCATGCC 2760 

TATAAACCCA GCATTTTGGA AGGCCCAGGC TGGAGGATTG CTTGAGCCCA AGAGTTAGAG 2820 

ACCAGCCTGG GCAACAGGGT GAAAACCTAT CTCTTTTGTA CTAAAAATTC AAAAAATTAT 2880 

CCAGGCATGG TGGCACATGC CTGTGGTCCT AGCTACTCAG AGGCTGAGGT GGGAAGATCA 2940 

CTTGAACTCG GGGAGTTTGA GGCAGCAGTG AGCCGAGATC ATGCCACTGC ACTCCAGGCT 3000 

GGGTGACAGG AGTGAGAC 3018 



FIG 8A/3 
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(2) INFORMATION POUR LA SEQ ID NO: 2: 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 11451 paires de bases 

(B) TYPE: acide nucleique 

(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lineaire 

(ii) TYPE DE MOLECULE: ADN (genomique) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 2: 



GATCCACCCG 


CCTTGGCCTC 


CCAAAGTGCT 


GAGATTACAG 


GTGTGAGCCA 


CCACGCCCAG 


60 


CCGACACTGC 


CCTAACTCTC 


AAGTTGCATC 


CTTACTCGAA 


TAGTATGACA 


GTGTGGGAAG 


120 


CAGCATGGGA 


CAATGTAAAA 


AGGAGGCATG 


TTTCTGGCTT 


CTGCTACTTA 


CTAGCTGTGT 


180 


GTCTTTGCAC 


GAGTTTCTTA 


ACCTCTCTGG 


GCCTCAGTTT 


CCTTATCTGA 


AAAATAACAA 


240 


TGATAGTATT 


CCCTTCACAG 


GGCCAAATGG 


AATACTATCA 


GGAACACTAC 


ATAATGGAAC 


300 


TCAATAAATA 


ATAGCTACTG 


CGGCCGGGCG 


CGGTGGCTCA 


CATCTGTAAT 


CCCAGCACTT 


360 


TGGGAGGCCG 


AGGCGGGTGG 


ATCACAAGGT 


CAAGAGATGG 


AGACCATCCT 


GGCCAACATG 


420 


GTGAAACCGT 


ATCTCTACTA 


AAGATACAAA 


AATTAGCTGG 


GCATGGTGGC 


GCATGCCTAT 


480 


AGTCCCAGCT 


ACTCGAGAGG 


CTGAGGCAGG 


AGAATCACTT 


GAACCCCGGA 


GGCAGAGGTT 


540 


TCAGTGAGCC 


AAGATTGCAC 


CAGTGCACTG 


CAGCCTGGCG 


ACAGAGTGAG 


ACTCCGTCTC 


600 


AAAAAAATAC 


CTATCTATCT 


ATCTGTCTAT 


CTACTGTTAT 


TCTTACCTGG 


TCATTTCCTT 


660 


TTTGTTTCAC 


AGGAAATTTG 


CGAGAATCCC 


CGATTTATCA 


TTGATGGAGC 


CAACAGAACT 


720 


GACATCTGTC 


AAGGAGAGCT 


AGGTAGGAAA 


GTGCCTCAGG 


TCAGATCCTG 


CCAGATGATC 


780 


AAGGGGTGAT 


TACAAGGTGT 


GATCCCCTTC 


CAGGAGGTAA 


AGGGACAATC 


TGTGCTTGCT 


840 


TCCAGTAACT 


TTTTGGAAGA 


TTTTTTATAA 


CAGTTGCTTT 


ATGGTCGTTT 


ATCTACATGC 


900 


TGGCGATTGC 


TTCATTTCCT 


CCTACATGCC 


TCTTTAGCAC 


TCTGCCATGC 


ATCACAGGGG 


960 


GTATCTGCAT 


CCTGTGGCCT 


CCTCTCCAGT 


ATCTCAAGGA 


CACTTACATA 


CCCCACTCAG 


1020 


CATGACAAAA 


GCCCTGCTTT 


TCACTGTATC 


GTCTTTCTTG 


GAAGACAGCT 


CTGTGACTGT 


1080 


GCACCAAGCA 


TGCCCCTTGG 


GCATGGAGAT 


TCTAGATACA 


CACACAAAAG 


GCATCGCCAA 


1140 


GGAAAGCACT 


TGTAACTGGA 


ACCCTTGGTT 


TAAATTGGCC 


CAGCATAGCT 


CCATCTTTAA 


1200 
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AAGAGTCTTT CCACAAAGAT GGCATCCGCC ATGTGGATGA GCATCCAATT TTCTCTTTGA 1260 

TTGGTTAGCT TGACTGCTCC ATCTGATCTT CCTCTCTCTC GACCTCTTGT TCAGAAAGTA 1320 

TTGTCTTTGG TGTGGACTAT AAGCAAGCTC TGTGAAGTAA AATTGGAGAG AACACCAACA 1380 

GAAACAATTT AAATTTGAGG AAAAGGGGGC ACCTAAGACC AAAGGAATTT GGCTTATTTC 1440 

ATTCCAGAAG GGGAGGCTGA GAATAAATCA GATGAATATC TGGGTTCCTG CACCTGAGGG 1500 

AAGGCTTCCT GCAGAGCCCT GGGCATAATA ATCTGGGACC TTCAAACCAA TAACCTCTTT 1560 

TCCAAGGAAA GACTGGCTGC TTCCAAGGAG GGTAGGGGAG AGTCGGGCTG CAGGCAGCTC 1620 

TCAAGTCTCC CCTTGCACAC TCTCAGGTTG GCATTTTCAC TTTAACCCAT CCTCCCTTAA 1680 

GAAGGCAGTT CTTTGTGACC AGGGTACACC CCCTATTATA TATATATATA CACACACAGA 1740 

GAGAGAGAGA GAGAGAGAGA GAGAGAAAGA GAGCAAAGTG TTACCTCCAA CTACATACAG 1800 

TACTCTGTCA GAAAAGAGGT TCAGAGAATA AGAAAACGTC CCGAGCTCAT TCCGTTGCCA 1860 

GCAATGTCTT ACTGCCCCCT ATAGACGGGT TCCAGGGCAG CTGCCTACCT GGCCTTCCTT 1920 

CCAATACAAA TCATCTTGGT GGATGGTTCT CTGAGGCTCA GTCTTCGCTG AAGTCAGAAG 1980 

AGGAATTGGA CTCACATTGC AAAGG C AC AG GGCAGGGCAG ATTTCCTACA GGTGTTAGGA 2040 

AGAACAACCC AGTTATGATC ACCTACTGCT CTGTCTCCAT TGAGGCCTAA AAAGGAAGTG 2100 

AGTTTATACT GCAGTTGGAG GAACTGCCTG CAGCCTTGAG GAAAATGTCT AGTCACAAGG 2160 

GAGTAAGTTA CCTGTTGATC ATATTGTCAA GGAATTCCTG TCCAATTCTC CTTCCCTGGG 2220 

TTGACACCTC TGTAAGGTCA GATCTGGAAG TAGGAGAGTG GGCACCAAGG GAGTCCCCGT 2280 

TCAGGGAAGT GGAGTGGCTG GCTGGGATTG GGGCTTTTTC TTCCCAGGAG GAGCAGGAGT 2340 

GCTCACGATC TGTGCCCTGT GTCTGCCTGC AGGGGACTGC TGGTTTCTCG CAGCCATTGC 2400 

CTGCCTGACC CTGAACCAGC ACCTTCTTTT CCGAGTCATA CCCCATGATC AAAGTTTCAT 2460 

CGAAAACTAC GCAGGGATCT TCCACTTCCA GGTGAGGTAA TGAGAGTGTA GTTAAGAGGG 2520 

CCAGCGGCAG GCCACCCACC GCTGGTCTCC TGGCCTTGAC TTCCCAGAAG CTGGAGGAAA 2580 

CTTCCCACCC ATCTACCCGC AGCGGCAACA GTCGGCATGG ACCCCCTTAA GGCTTCAAGC 2640 

CTGGGAGGAA GCAGTTGCTT ATCTCTGGCT CCCTAATCCC TCCCCCACCA CCTTCCACTA 2700 

TGTCCCAGAA AGACAGGAAG ACATCCTGTT TACTGTGGGT CTATTTTTGT CTTTGCAGCT 2760 

GTCTGCCTGC TTTTATTGCC TGCAGCCCTT CTCAAGTAGG TCCCTAAGAT ATTAGCACTG 2820 
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TGACACCACA GGACCCTTCA GGTTGTACAG GAACCCCTGT CCAGGGCTCC TGTATACTTC 2880 
TTCCTCTCTA AGGCATGGCG GTACCAAGGC TATCACTCCT CTCTTCCAAG CCCTGGAAGA 2940 
AGAGTCTGCT TAACCTGGGG ATCAGGCTTC TTGTTTGCCC TAGAACTGAA TCTGATGGTT 3000 
CTAGAATCCA TCCAGCTACT GGAAATTTTC TGGGTCCCAG TCACCTTGGC ATAGAGCTGG 3060 
TGCTAGAGCA GAACCAAACT GAATTCTACC TGTGAGGGTC TCGTAGCTTC CGGGATGCTG 3120 
GGGAGTCAGC CTGTCTCCAG CTTCAAAGGC TCCCTCATGT CCCAGGATGA CCCACATTAT 3180 
CAGTTCTTGC TCCCCGGGTC TTGCACCTCA GCACGGAAGG CCTCAGAAAA GGTCTGTCTC 3240 

CAGGCTCAGA CTCCCCCTCC TGCCGCCTTG GGAACATGGC ATATTTAAAG GGTCTCAGAT 3300 

CTAAAGGGCC TTACATACAA ATATCAGATA GATTTCTGTT CTCATTTCAA TGAGGGAGAA 3360 

AGTGCCATTG AAAAGGAGAC TAAACCACAT TTGGCCCTTT TCAGTTCAAA CTGATTCATT 3420 

CAAAAAAGAG CGACATCCAA ACTTGAAATG ATTGAACAAT GTTCCTGCTA CAGCTAGAAT 3480 

AGATT CTGGG TCACTTTGTT CCTCCGTTTC AATCCTTGTT CTTCAGTTTG GCATCAAGAA 3540 

ATACCTAAAT CAGCACAGTG CCTTCACTGC ATAGTTCCCA ATCCTGGCCA CATTGAATCA 3600 

GCTGGGGGCA CCTGAGAGTG CTGACACCCA GGCCCTGCCC CAGACCTGCT GAGCAGGAGA 3660 

ATGAAAATCT TACATCCTAA GACACTCATG GAGCACCTAC TCTACCCATT ACTGGGCTGG 3720 

ACTCTGTGGA AGACATGAAG TATATGTAAC TCACTTCCAG CTCTCAAAAA GCACCCAGTC 3780 

CAGTTAGAGA CAGATTTACA CACCCCAAAC ACAAAATAGG ATGAACAGGC ACCCAGATGC 3840 

AGAGTCCAGG AAATGATGCT GCTTTGGGAT TCAAGAACCC CCTGAGGAAT GTGGAGGAAG 3900 

GACACATTTC CTAACAGTAA TTTGAGTATG TGACTCTGTG CGTGACGCTT CTGTGCAGTT 3960 

CTGGCGCTAT GGAGAGTGGG TGGACGTGGT TATAGATGAC TGCCTGCCAA CGTACAACAA 4020 

TCAACTGGTT TTCACCAAGT CCAACCACCG CAATGAGTTC TGGAGTGCTC TGCTGGAGAA 4080 

GGCTTATGCT AAGTAAGCAA CACTTTAGAA TGTGAGGTGG GGCTAGAGGT GAGAAAGTGG 4140 

GTTGCAAAAT CCAGCCGAGA CCTCACTCAC AGGAAGAGGC ATGTGCCTCT ATACGTGCAT 4200 

ATGTGTGGGC ATGCAAGTCC AACTGTGACC CAAAGTTAGA GATCAGTTCC AGGCAACAAC 4260 

AGCTCTAACT AAAAACATTA AATTTAAGAG TAGAAATGAA GATTTGCATA GAAGACCTTT 4320 

AGCTTTAGCT CAC CAT AG CG AGTTCTTTCA TTGCACCTCC ATGGTGGCAT TGCAAGTCTT 4380 

GGGATCAGAG CATTGTCCCA GGGTCTCGAT TGGCTCAACC TCATGTGCTT ATAGAAGATT 4440 
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TATAAAGACA TGTTGTCTCT CAACTTAAAA GCTCCACCCC AGATGATAAT AATGGATTTT 
CAAATTTTGG AACAAGGTCA CTCTGTAATG CAGGCTGGAG TGCAGTGGTG CAGTCACGGA 
TCACTGTAGA TTGACCTCCT GGGTTCAAGG TGCTCCTCCC ACCTCAGCCT CCCAAGTAGC 
TGGGACTACA TGCGGGCATC ACCATGGCCC TTTTATTTTT GTATTTTTTT GTAGAGCGGG 
GTTTTCCCAT GTTGACCCAG ACTGTTCTCG AACTCTTGGG CTCATACAAT CCACCAGCCT 
TGCCCTCCCG AAGCGCTGGG ATTGCCGCTG TGAGCCACCA CACCGGCAGC TGCTAATGGC 
TTTAATGCAG CCCTTCCTCA ACGTTCAGGA TGTAGTGGAA AGAGCTCTCA GGAAGTGGGG 
ATAGCTGGGT TTCAATCCCA GTGCTTCTGG CTCTCTGTGG TCTTGGGTGG GTCACTTAGC 
CTCTTGAGCT CAGTTTCTTC ATTATGAAGA AAGGGAATCA TTGTTTCCAT CCCATGAGCT 
CATAGGGTTA ATGTGGAATT GATGAAAGAA CATCACAGCA TCCAAGAGGT AAAGTTCTGG 
TGGCAGTGGT ACCTGGGTTT TGTTCCCTGG AACTCTGTGA CCCCAAATTG GTCTTCATCC 
TCTCTCTAAG GCTCCATGGT TCCTACGAAG CTCTGAAAGG TGGGAACACC ACAGAGGCCA 
TGGAGGACTT CACAGGAGGG GTGGCAGAGT TTTTTGAGAT CAGGGATGCT CCTAGTGACA 
TGTACAAGAT CATGAAGAAA GCCATCGAGA GAGGCTCCCT CATGGGCTGC TCCATTGATG 
TAAGTCTGGG GTGTGGGGCA CAGGGTGGGG AGCTCCAAGT GTCAGGAAGC CTTTTACCCA 
ATGAAGGGCA GCATAGAGCT TTTGTGTGGG ACAGAGCGAA TGTTTTGTTT GAGGAAGCAG 
GAACTGGCTC TCAACTTTGA GGACTGGGAA TTTCTCAAGG GAGAACAGTT CTTCCGGATT 
TTCAATAAAG ACACTGGTCA AGGACATTTC AAGCCCTGGA ATGTCAGTGG AAATCAGTCC 
AGAGGCCTGT GTCAGTGGAG GCCTCCCTTG CTGGTGCTCC TCAGTCTCAG CACGCTCCCA 
TTAAGCTGGC CACGTACTTG GCTGTGGACC TGAGCCCACC ATTTCCCTAA GAAAGCCTCC 
CAGTCACTGG GCTTTCACCA CACCTCCCCG CTTGAGACGT GGGCTTTGTG TTGTTACCTG 
GGAGAAGCTA AGCCTGCAGC ACCTTTCAGT GCAAAGAAAT GCTGTGAACT GAGACAGGAG 
CCAAGGGTAG GGAGATGGCC GCCCATGGCC AGGCCTCCTT CAGGGGGCAT GCCTTCCCTG 
AGGGCTGCTC ACTATATTGA TATGATAATC TTAGTGGTTT CCATTGGGGA GGATGGGGCT 
GAAGCTGAAT TCCTGCCCCT TCTTCTCCCA ACACGCCCAA TGGACAGCTT GGAAGGTCAG 
TTAGCACACA ACACCATGGA TGAACTTTTT TTCTGTATCA CTTTTCTCCG TCTTTCCTCC 
ATTCGTGCTC TGTTGATCTC TCCTCTCTCC CTTTGTCTGT CCCATCTCTT TCTCCTCTCT 
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CCTTCCCTTT CCACCCTTCT GTGTTTGTTC TCTCCCTCCC CTGTGTTGTT CCCTACATTC 6120 

TCCATCGGGC CTCAGGATGG CACGAACATG ACCTATGGAA CCTCTCCTTC TGGTCTGAAC 6180 

ATGGGGGAGT TGATTGCACG GATGGTAAGG AATATGGATA ACTCACTGCT CCAGGACTCA 6240 

GACCTCGACC CCAGAGGCTC AGATGAAAGA CCGACCCGGG TGTGTACACC TCCGATTATC 6300 

AGAACTGACC ATCCCTCCAA CCCACATGAC CCCGCCCTAT TAGTGTCAGA CTCCCCTCAG 6360 

CAGCCAGGGC CTTACCCACA CACCCCCACC TGGCACCTCC CAAGGGTCTG GGTTGAAATA 6420 

ACTTGCTCAG CCAAGGCTCC TGAAGAGGGT GCAAGAACCA GGATTTTGGA GGGAATCTCT 6480 

GCTGGAGTTT CTGCATATTC CATGGTCCAG GCAGTTCCTC TCATAACGAA CTATGAGACA 6540 

GAAATACTTG TAAAGATACT TCATTTATTT TGAAATATTT TTCCTCTTCT AATGTATTCA 6600 

TTTATTCATT CAACACTTAT TTTTGAGCTC CTACTATGTT CCAGGCACTC CTCTAGCAAA 6660 

CAAAGCAAAT TCTCTCCTCT TTTTCAATAT TTGTGGAAAA AGCAAGGTCT CCCTCTTGTA 6720 

GAGTTTATAT TCTAGTATTT TCATAAGTTA TACCTGCTCA CTGGAGAATA CTGAGCCATA 6780 

CAGAAAAACA CAGAGGAAAA TTTCACTTAT ATTTTTCCCC ATGTAAAGAT AACCACTCTT 6840 

AACATCTAGT ATATGTTCTT CCAGGATTTT TCTATGCACA CACTGAATCT GTATTTTTAT 6900 

TTTTAAAATG TTATCATATT GTATGTACCT CTTTGCAGCC TGCTTTTTTC AGTTAGTTTT 6960 

TTTGGTTTTT TGGTTTTTTT TTTTTTTTGG AAACCAAGTC TTGCTCTATT CCCTAGGCTG 7020 

GAGCACAGTT GTTGCCATCT CGGCTCACTG CAACCTCTGC CTCCAAAGTT AAACTAATTC 7080 

TCCTGCCTCA GCCTCCCGAC ATAGCTGGGA TTACAGGCAC ACACCACCAC ACATGGCTAA 7140 

TTTTTGTATT TTTTAGTAGA GACGGGGTTT CACCATGTTG GCTGGAATGG TCTTGAACTC 7200 

CTGACCTCAA GTGATCCACC TGCCTCAGCC TCCCAAAGTG CTGGGATTAC AAGTGTAAGC 7260 

CACCACACCC GGCCTAGTTT GATATTCTTA ATGTGCCCAA AGTATTCTCC TGTAACATTT 7320 

TTTAATAGCT ACACAATATT CAAACACACA GATATGTTAT AATTTATTTA CCCAATACCC 7380 

TATTATTGGA AAGTTGAGTT CTTTTTTTTC TTTGTTTTGT TTTGTTTTGC TACTATTCTA 7440 

AAATGCTATA ACGAACATCC CAATAGATAC ATCTTTGTAT ACATCCATGG TGACTTCCAT 7500 

AGGACAGATT CCCAGCAGTA GAATTGCTGG GTTGAATGAT ATGCTTAGGG TAATGACAGA 7560 

AGAGTCATTT CAAGCAGCTT CCTAGGGTCT TAGAACTTAA GGATTAATGA GTCTTCCCGC 7620 

CCCCTCCCAG TCTATTCAGC ATGATCTGGA TCATGAGGAC TGAGATCTGG AAGAGACTGA 7680 



FIG. 8B/5 



16/30 



GATCTGGGAG AGGCTGAGAT ACCAAAAGCC CTGGCTCCAC CCATACCCCT CGCCCTGAAA 
ACAGCTCTAG GAATTCCGCG GCCTAGCAAG GCTCCGGGAA GCTCCTTTTA AAGCTGTGAC 
GTTAGTAGGC ACATGGACCA TAGAGACCTA TCCAGGGCTC ATGGGACTTT AGTGATCCTG 
CCCTTCTCCC AAGGATCCCC CATGGCTGCA ACTTGGAAAT TTCTGCAAAT GGAAGAGCTA 
CTCCTTAGGC ACGGTCATGT CTGAGCAGGG ATCTCCTCGG GCTTTCTTAG AATTCTCTCC 
CTGGGCACTG GGACTCTTGA TTTCTTGAAT ATTATGTTCC AGGTGGGTGT GGAGGAGGTG 
AGGGGATGTA AAGAAGGCTA GACTTGGCCA GGCGCAGTGG CTCATGCCTG TAATCCCAGC 
ACTTTGGGAG GCTGAGGCGG GTGGATCACC TGAGGTCAGG AGTTCGAGAC CAGCCTGGCT 
AACATGGTGA AACCCCGTTT CTACTAAAAA TACAAAAAAT TAGCTGAGCA TGGTGGCACG 
TGCCTGTAAT CCCAGCTACT CGGGAGGCTG AGGCAGGAGT ATCGCTGGAA CACGGGAGGC 
AGAGATTGCA GTGACCCGAG ATCGCGCCAC TGCACTCCAG CCTGGGCGAC ACAGCAAGAC 
TCTGTCTCAA AAAACAAAAA AGAAAGAAAA AAAGGAAAAG CTAAGACTTA CATGTGTCAC 
TTAACCCCTT TTCTCAAACC TCTTTCTCTT CCAGGAATAG TCAACCCCTG GATGGCTTCA 
GGGGAAGGGG GATCCTGAAG CCCAGGGCAG CCTCCAACTC TACCCCTTCC TCCTTTGAAG 
GATACTAAGG GGTCCAGAAA GGAGGGGCAG GACACTGTTA CCCACCCCAC ATCCCAGCAT 
CCACATTGCT CTCTGATGGT CAGGACAGAG CCTTCTCAGG GAGACCAGCC TGTCTGGAGC 
TGTGTCTCTT GGCACTCTTA AAGGGCCACT GAAGGTCCGT TCGTGGTCGT GAGGCACACT 
TTCAGGGAGC AGAGTGGTCT GTGTCTTCAC AGAGCCCGGA AAATGAACTA GTATGAACTT 
TGCCTCCAAG CAGCAGAACT TCTGTTCCCC CGCCCCTAAT GGGTTCTCTG GTTACTGCTC 
TACAGACAAT CATTCCGGTT CAGTATGAGA CAAGAATGGC CTGCGGGCTG GTCAGAGGTC 
ACGCCTACTC TGTCACGGGG CTGGATGAGG TAAGCCTGGT GGGGCTTGGT GGGGCAAGGG 
CACCCTCCTG GGTTAACCTC ATGAAGTCAG GACTTAGCTG TTGGGGCCCC TGCCCTGTCT 
GCAGAGCTTG CCTCCAATCA GGACATTCAG TTCAAGGTCC AAGCCACGCC TGGGAGCAGA 
GGGGCCTGTG AAACTCGTAG AGGTGGATCC TGCCACAGTT GGTGCACAGT TTATCTTTGC 
TTTTCGTGCT AAAGATGGCA ATTTTTCCAA CATTTCCAAT GAACAAATTG AAATATCACT 
TAACTTTGCT TTTACAAAGT TGGTTTCATG TGTTCTTGAG CTTCCTGTTC TCTCGTGTTC 
AGATAGCTAC AGTTGTCTCT GGGTAGCCAC GGGGACTGGT TCCAGAAGCC CCAACAGTAA 
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CAAAATCTGC AGATGCTCAA GTCCCTTCTG TAAAATGGAG TAGTATTTGC ATATAACCTA 9360 

TGCACATCCT CCCATATACT TTAAGTCATC TCTGGATTAC TTACGATACC TAACACAATG 9420 

GAAATGCTAT GTAAATAGTT ATTGCACTGC ATTGGGTTTT TTTGGTATTA TTTTCTGTTG 9480 

TTGTATTATT ATTTTTTCTT TTTTTGAATA TTTTTGATCC ACAATTGGTT ATATGCCAAA 9540 

GCCATGGATA CGAGAGGCTG ACTGTTCTGT TTTGCTCCTT CTGGGACTTC TGGGTTTTCC 9600 

TGGACCATGT CTGAGACAGG AACGTTGTAA GACCTGTTGC ACACAGTTGG GCAGGTTGTG 9660 

CCCTGTACAG AGGGATGGGC TGAGAGGGGC AGTTGCCTGC ATCACCCATT GCAGCAGACT 9720 

GGAGGGAGTC TGCTTGTTTG TAGTTCCTCA GTCAGCAGGG GCCTTTTGTC TTTCCTTCCT 9780 

TTCCTTTTTT TTTTTTTTTG AGACGGAGTC TCACTCTGTT GCCCAGGCTG GAGTGTAGTG 9840 

GCACAGTCTC GGCTCACTGC AATGTCCGCC TCCTGGATTC AAGCGATTTT CCTGCCTCAG 9900 

CCTCCTGAGT AGCTGGGATT ACAGGCGCGT GTCACCATGC CCAGCTAATT TTTGTATTTT 9960 

TAGTAGAGAT GGGGGTTTCT CCATGTTGAT CAGGCTGGTC TCGAACTCCT GACCTCGTGA 10020 

TCCGCCCACC TCGGCCTCTC AAAGTGCTGG GATTACAGGC GTGAGCCACC ACGCCTGGCC 10080 

AGCAGGGGCC TTTTTTCTAA TTTATATGAA GACACCTAAT TTATATGTGT TAGCAAAGCC 10140 

CTCCTGTTTA TGCCTCACCT CCTCCCCCGA AGCTCATACG GCAGGATGTT CCTGAGAAAA 10200 

TTGCCTCTTA GAAGATAGAG AGGAGATGCC AAGCCTAAGT TAGGCAGACT CAGGAGGATA 10260 

GGTCTGACCC ACCCCCTGCC ATTCCCCAGC ACACTTGTGA TTAATCTCCT TGGCCAGAGC 10320 

CAGGCAGAAC ACCCTCGCGT AAGAGATTTG CCCCCCAGCC CCGTCCCAGC CCTCAGCTAG 10380 

ACAGAAGATT CCCTTTCCAG AGAGGCTGCA GAGCATGAGA GCTCTTTCTG TGTGCTTAAG 10440 

GTCCCGTTCA AAGGTGAGAA AGTGAAGCTG G1GCGGCTGC GGAATCCGTG GGGCCPlGGTG 10500 

GAGTGGAACG GTTCTTGGAG TGATAGGTAG GTGAGGGGAC CCCACGGGAT TGGCGGTGGC 10560 

GGGGAACAGG GTCCGGGACA AGGCTGTGTT GGGAACTGAG CCATGAGAGT ATTGAAGATG 10620 

CTTGGTATAA AATCACCCTC AAAACCAATG ATCCGCAGAG AAGAGGGGCA CAGGTGTTGG 10680 

CTCCAGGGAA GGGCCAGGAG TGGAAGCGGG GTGCTGGGGA CCCAGAGAGG TTGCTGACAA 10740 

CCATTGGCTG GAAAGGAAGG ATTCCAGAAA GCGTGGGGAA GGTCCAGGCA GGAAAAGCGT 10800 

ATGAATGCAG GGTTCTGGGC TAGAGAAGTG ACTTCCCTTC TTGGGGTCTT GTGTTGCCTT 10860 

TCCTGTGAAA TGGGAACAGT ATT ATT AG C A CTTACCTTGT GGGCTG AT AT TGAGGAGTAA 10920 
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(2) INFORMATION POUR LA SEQ ID NO: 3: 

(i) CARACTERISTIQUES DE LA. SEQUENCE: 

(A) LONGUEUR : 1834 paires de bases 

(B) TYPE: acide nucleique 

(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lineaire 

(ii) TYPE DE MOLECULE: ADN (genomique) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 3: 



ATTTTTTTTT 




GACGGAGTCT 


CACTCTGCCA 


CCCAGGCTGG 


AGTGCAATGG 


60 


CGCGATCTTG 


GCTCACTGCA 


ACCTCCGCCT 


CCCGGGTTCA 


AGTGATTCTT 


CTGCCTTAGC 


120 


CTCCTGAGTA 


GCTGAGACTA 


TAGGTGCCCG 


CCACCACGCC 


CAGCTAATTT 


TTGTATTTTT 


180 


ATTAGGACGG 


GGTTTCACCA 


TATTGGCCAG 


GCTGGTCTCG 


AAATCCTGAC 


CTTGTGATCC 


240 


GCCCACCTCG 


GCCTCCCAAA 


GTGCTGGGAT 


TACAGGTGTG 


AGCCATTGCG 


AGCAGCCCAG 


300 


AACTCAATTC 


TTAACCTTTA 


AAGTATGATG 


AGAAGAAGGA 


TCAAGCCCTC 


ACCAGCCCAT 


360 


TTAAGGAGTT 


TAGGCTCAGT 


CTTGAGGATG 


TGAGAAGTCA 


TTGCTATTGG 


GTTTCACACT 


420 


GAGGTTAACA 


GGTGAAGTCA 


GCATTTTGGT 


AGTTCACAGC 


AGCTGCAACT 


CTTTGTATTT 


480 


CTCTGATACC 


TCCTGTCCCA 


ACCTACATCA 


GGCCTTCCCT 


TCTTCCTGCT 


TCCTTAATTC 


540 


CTCCATTTTC 


CCACCAGATG 


GAAGGACTGG 


AGCTTTGTGG 


ACAAAGATGA 


GAAGGCCCGT 


600 


CTGCAGCACC 


AGGTCACTGA 


GGATGGAGAG 


TTCTGGTGAG 


TCCAGAACCC 


AGGAAGACCC 


660 


AGAAGGGTAA 


GGGTGGGGAA 


GAGAGGGGAA 


ATCTCAGACC 


TCAGTCCCCA 


GCTAAGGTTA 


720 


TCAGATTCCA 


GCCCTTGGGA 


GATCTTGGCT 


GTGTTCTCCT 


CCAGCCCAAG 


GCCCAGCAAG 


780 


GATGAGGTTC 


TGAGAGGAGC 


CTTCCAGGCC 


ACAGGGACAA 


TGAGCCCAGG 


ACCAGGCCAA 


840 


CATGACATGG 


CTCTTGCCTC 


CTGTGTGCCC 


CTCCGCCACA 


CACTCTATTC 


CAGCCACACG 


900 


CACCCTGGCC 


TTAGCACAAT 


TCTTTTCTGA 


GCCTAGGAAG 


CTCCACTTAC 


CCTGATCTTC 


960 


CAACGTCAAC 


CTCACCCTCT 


CTCAGGTTGT 


TTCTATTCAG 


GCTTCAAGTC 


TCAGCTTAAG 


1020 


GAGAATTTTC 


AAGTCTCAGC 


TTAAGGAGAG 


CCCCCTAAGT 


TCCCCGAGGA 


CTGGGATTAA 


1080 


TTTATGATGC 


TCATCACCCT 


TAAAATTGTT 


TGCTTAAGCC 


GGGCGCGGTG 


GCTCACGCCT 


1140 


GTAATCCCAG 


CACTTTGGGA 


GGCCGAGGTG 


AACGGATCAC 


GAGGTCAGGA 


GATCGAGAAC 


1200 
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ATCTTGGCTA ACACGGTGAA ACCCTGTCTG TACTAAAAAT ACACAAAAAA AGTAGCCGGG 
CGTGGCAGCG TGCGCCTGTA GTCCTAGCTG CTGGGGAGGC TGAGGCAGGA GAATCACTTG 
AACCTGGGAG GCAGAGGTTA CAGTGAGCCC AGATTGCGCC ACTGCACTCC ACCCTGGGCG 
ACAAGAGAGA CTCTGTCTTG GAAAAAAAAA AAAAAATGTG GTCTTAGTTT AATGTCAAGG 
GAAAGGTTTT GGGTGTTTTT ATTACTTTAT TTTTTATTTA AAAACTATAA TAGAGACGGG 
CCTCGCTATA TTTCTCGGGC TGGTCTCAAA CTCCTGGGCT CAAGCGGTCC TCCCACCTTG 
GCCTCCCAAA ATGCTGGCAT GTGGGCCTGG TCAACATATG GGACCCCAAC TCTACAAAAA 
ATTTTAAAAT TAGCCAGATG TGGTGGCGTG TGCCTGTAGT CCCAGCTACT TGGGAGGCTG 
AAGCAGGGGG TCACTTGAGC CCAGGAGGTT GAGGCTGCAG TGAACTATGA TTGTCGTTCA 
CTTTTCTTCT GAACGTGAGA TTAAGTGTAG TCAGCAATTT GGCTTAGGAT TATTTATTCA 
GAATTTTTAA CCGTCACGTT GCGGCAAACC AGGT 
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(2) INFORMATION POUR LA SEQ ID NO : 4: 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 14664 paires de bases 

(B) TYPE: acide nucleique 
■(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lineaire 

(ii) TYPE DE MOLECULE: ADN (genomique) 

(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 4: 

AGGAGGTGGA GGTTGCAGTG AGCCAAGATC ATGCCACTGC ACTCTAGCCT GGGCAACAGA 60 

GCGAGACTCT GTCTCAAAAA ATACACACAC ACACACACAC ACACACACAC ACACACACAC 120 

ACACACATAT ATATACACAC ATATATATAC ACACACATAT ACACACACAC ACGTCTGTAT 180 

ATATATGTGT GTGTGTATAT ATACACACAC ACACTATTCT ATATATTCTT GTAGAGCTAT 240 

GTGTGTCTCC TGTGCTATTG AGCATGAGCC CTTTTTTTTT TTTTTTTTTT TTGAGACAGA 300 

GTCTCACTTT GTCGCCCAGG CTGGCATACA ATGGCGCAAT ATCGGCTCAC TGCAACCTCC 360 

GCCTCCTGGG TTCAAGTGAT TCTCCTGCCT CAGCCTCCCA AGTAACTAGG ATTACAAGTG 420 

CCCGCCATAA TGCTCAGCTA ATTTTTGTAT TTTCAGTAGA GATGGGGTTT CACCATGTTG 480 

GCCAAGCTGG TCTCAAACTC CTAGCCTCAG GTGATCCACC TGCCtCAGCC TCCCAAAGTG 540 

CTGGGATTAC AGGCATGAGC CACAGCACCC TGGTGAGCAC TAGAGCTTAT TTCTTCTATC 600 

TAACTGTATT TTTGTATCCA TTAGCCACCC TCTTTTCATC CTCCCCTCTC CTTCCCTTCC 660 

CAGCCTCTGG TAACCACTGT CTGCTCTCTA CTTCCATGAC ATATGCTTTG TTTTAGCTCT 720 

CACATATGAG TGAGAGCATG CGACATTTAT CTTTCTGGCC CTGGCACATT TTTGAATCAT 780 

TGTTAGAAAA GATGATGGTT TGGAGTAGAT ACATCAGAAG TGACAGCGTT TGCCCTAAAA 840 

AGGAAAGACA GGCTCCTCTG GGACCCTGAC CAAGTTCCTG TGAACTATTT TATTATTGTG 900 

CTGTGTTAGT CCTGGGGTCT TCCGTTCCCA GCCCTCCTCA CCTGCTCCCA TATGGCTCTC 960 

TCTCTTCTTC CAACCTCTCA GGATGTCCTA TGAGGATTTC ATCTACCATT TCACAAAGTT 1020 

GGAGATCTGC AACCTCACGG CCGATGCTCT GCAGTCTGAC AAGCTTCAGA CCTGGACAGT 1080 

GTCTGTGAAC GAGGGCCGCT GGGTACGGGG TTGCTCTGCC GGAGGCTGCC GCAACTTCCC 1140 

AGGTGGGAGA TGCTCTTGAT GGGGGGAGGG TCTAAGCCGA AAAAGTTCCA GGCAGAAGAA 1200 
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GCCTAACTAG TGCTTATTAA GTCTCTCTGT TCCAGACGTC CACTATCTTA TTAAACCTTC 
CCTGTTTTAC TGAGAAGGAA ACCACCATGC TGAGAAGTTT GCAATAGGGA GCTGGGTAGC 
AACTTTGGAA GCAGGAACTT GTGGGAACAA TGCAGATGCT GCTTGGACTT ACGATGAGGT 
TATGTCCAGA TAAGCCCATC CATCTTTTGA AAATACCCTA AGTGAAAAGT GCATCCAATA 
TGCCTAACCC CCCAAACCTC ATAGCTTACC CTGGCCTACC CTCAAACATT GCTCGGAACC 
CTTGACCTTA AGCCTAAAGT TGGGCCAAAT CATCTAACTC CAAAGCCTAT TTTACAAAGA 
AAGTTGTTGT AATATCTCCA TGTAACTTAC TTAATACTTG TACCTAAAAA GTGAAAAACA 
AGAATGGTTG TACGGGTACT CGAAATCCAG TTTCTACTGA ATGTGCATCT CTTTCACATT 
GTAAAGTTAA AAAATTGTAG CCGAACCATC CTAAGTCAGG GACTGTGAGT ACTGTGTCAG 
TAACAGTAAG GGCACTATTG GAGAACCAAG TTAGCAGCTG CTGCAATAGT TCAAGTCAGA 
GATGATGAAA ACCTAGACCA AGTCAGTAGC AGCAGAGATG GAGGGGAGAC AGCAGATTTA 
GGGAGAGCAT ATTGGGTGAT GTAGGGAAGG AAGAAGAATG ATGTCAAGAT TCCCAGTTGG 
GGACCTGACA ACATTGCAAC ATAAGACACA CAAGAAGATC GGGTGGGTGG CTCATGCCTA 
TAATCCCAGC ACTTTGGGAG GCAGAGCCAG GAGGATCACT TGAGCCCAGG AGTTCAAGAC 
CAGCACAGGC AACATAGTGA CACCTCATCG TTACCCAAAA TAAAAAAAAA AATGAGGTGG 
GAGGATTGCT TGAGCTCGGG AGGTTGAGGC TACAATAAAC TGTGATCATG CCACTGCACT 
CCTGCCTGGG TGACAGAGTG AGACCCTGCC TCAAAAAAAA AAGACACACA AGAGAAAAAT 
ATCAGCGTGT TGTTTGTTTT TGGTGGAGTT AATTGTGGGG TTCTAGGGAA AGGAATTTAG 
CTTGGGACAT GGAAAGTTTG AGGTTCCTGT AGAGTGTCCC AGTGAAGATT TGTAATAGAG 
CATCGGATGC GCATATTAGA TGGCACTTGG TGATATGATA AGAACTCAAA AAATATTTGA 
GGAATAAAGG AAAGAAGAGG CCAGACGTGG TGGCTTATGC CTGTAATCCC AG C ACTTTGG 
GAGGCTGAGG CAGGCGGATC ACTTGTGGTC AGGAGTTCGA GACCAGCTTG GCTAACATGG 
TGAAAACCCA TCTCTACTAA AGATACAAAA ATTAACCGGG GATGATGGTG GGTGCCTGTA 
ATCCCAGCTA CTTGGGAGGC TCAGTCAGAA GAATCGCTTG AACCCAGGAG GCGGAGGCTG 
CAGTGAGCCG AGATCGCGCC ACTGCACTCT AGCCTGGGCA ACAGAGCCAG ACTCCGTCTC 
AAAAAAAAAA AAGTGAGAGA GATTGAGGCT GGGATATATG GCTCAGGCAT CATGCGCGTG 
TAGGGGGCAG TTAAAAAGCA GAAGTAAGAA AGATTGCCTA GGGAGGCAGG AAGGGTGAGG 
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TGAGAGGAGA AGAGGCCCAG GACCAGATTC TAGTCACCAA CAGCGTTTAA GGGGCAGGTA 2880 

AGGAAAACAA AACCATCAGC AAAGACTGAG AATGAAAGCC CAGAGAGGAA GGAAAAGCCA 2940 

CACATACAAT CAGTACAGCT CC ATCTGAAT AAAGGTAGCG CCCCCCCCCC CCCAAATCAT 3000 

TAGAGAAATG CCTGATTCGG TTTTCTGTGG ATTTTTCCTA AGAACCTAGA TGTGGGGAAT 3060 

AGAAATAAAT GGTTCCCTCT GTCTCATCCC CTCCCTGCCC TCTGAGAGGA AGCTGTGATT 3120 

GCGTGCTCCC TTTCTGGGGG TGCAGATACT TTCTGGACCA ACCCTCAGTA CCGTCCGAAG 3180 

CTCCTGGAGG AGGACGATGA CCCTGATGAC TCGGAGGTGA TTTGCAGCTT CCTGGTGGCC 3240 

CTGATGCAGA AGAACCGGCG GAAGGACCGG AAGCTAGGGG CCAGTCTCTT CACCATTGCC 3300 

TTCGCCATCT ACGAGGTGTG TAGTCCTGAT TGGCTCCAGC CCAGGAAACA TACTTTCCCA 3360 

GAGAGGACGC TTCCAGGGGC TTCTAGAGGG GCCCTCTGCT TCCTCAATAC CAGTGACCCA 3420 

CAGAGCTCCT GGTATCAGGA CCACTTGTGT TTGTAACAAG CAAAAAATAC CAGGGGGGGC 3480 

ATTAGAGAGG CAGTGGAGCG GGCCTGGCAG AACAGGTGCC TGGGGGTCAG GCTTCCGCAT 3540 

GCGGGCTGCA GTTGCTGGCA TTGCCTTCCG CAGGCTCCTC ATCCTCATTC ACATCTGAAG 3600 

CATCTTCCTT TCTGTTTCTT CTCAAGGTTC CCAAAGAGGT ATAGCAGCAG CAGCGGCCAG 3660 

CAGTTGTGTG CAGCACTACC CAGGGGGGCC CGAGTCTGTC TGTGGCTCGT CGAGAAGCTT 3720 

CCTGGTGGGG TTTGTGGGCA GGACTTGTGA TAGGAGAGGG CCTTGCCTGT TGTTATTTCC 3780 

CACTTGCAGA GCAGGTTGCC TCAGGGCATT GCATGACCCA TGACTACCAC CCCCAGGATG 3840 

TGCACTTTCT CCCTCGCACC AGACACTGCA CGTCACACAC ATGCCTTTGC ACACTCACCC 3900 

TCCTCCACGC TTACAGCCAC ACACACAGTC ACACAGACGC GTTCTGAGGG TGGCTGCCCG 3960 

CTTGGGATGG AGGAATCACT TCCCTCAGAA CCCAGCCAAG TCCTCTAGGC CTCCTTGGGG 4020 

GTCCTTCCAG CCTGAGGGGC TTCGGAGCTG AGGACAGCTG TTCTGGTAAG TGTCCCTGAG 4080 

TGTGGGGATG ACACATTTCC ATTCACTCTG AATCACAACA GAAAAGGGAA GAGGAATTGA 4140 

GGTAGGGAGC CTATTTAACC CTTGGGAGTC GGGAAGTAGG GAGGTTGAAA CTGTGACATG 4200 

GGTGACCAGG GAGTTGGGAA GGGACCCTTG GAGGTGGCTG TGGCAGGACA GGACGTTCCT 4260 

CCCGAGGGGC TCATGTGCCC TGGGCTCTCC CCATCTCTCA GATGCACGGG AACAAGCAGC 4320 

ACCTGCAGAA GGACTTCTTC CTGTACAACG CCTCCAAGGC CAGGAGCAAA ACCTACATCA 4380 

ACATGCGGGA GGTGTCCCAG CGCTTCCGCC TGCCTCCCAG CGAGTACGTC ATCGTGCCCT 4440 
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CCACCTACGA GCCCCACCAG GAGGGGGAAT TCATCCTCCG GGTCTTCTCT GAAAAGAGGA 
ACCTCTCTGA GTGAGTGCTG GCCCAGCTTT CCCACGTGTT TCTAAAAGCT CACATGGCCC 
ACTCCAGAGG TTGAAGGCAT GAGGCAGCTA GACACGTCTC CTCCAGGGTC CTTCTGCTGC 
TCCTGAGCCA CTGGCCACAT TACCCCCATT CATTCATTCA TCCATTCTGT GATATTTATT 
GAGCACCTAC TATGTTCCAG GCACTGTCCT AGGCACTAAG GATAGAGTAG TGAAGTAAAC 
AGAAAGAAAT CCCTGCCTTC ATGGAGCTTA ATATTCTAAC ATGAGACAAT AATGGATAGG 
AAAAACATAT GTAGCATGTT AGATTTGGAG AGGTGATATG GAGCAAAAAT AAAGTAGGGA 
AGAGGGATAG GAGGTGTTGG GGATGCTTGA AATTTTAGGT TAGCATGGCC AGGAAAGCCA 
CATCCTGTCC CTGGCCACCA CAGATGAGCT CATAGCCCCT GCCACTCTGA TCTCTGTCCT 
TGGAAGATGC ACCAGGTCCA TGGGTAGGTG GCTGGGTCAT GCCTTTGGGG GGCTCTGAGC 
AATACTAACA agaacctgcg TGCCTGGGCT tggctgtcgg ggatggtgct gacatggggc 
TGGTTCCTGG GGTTGGGGTG TTCCAGGGGT TCTCTAGAGG CTGGTTCTGG CTTGGCTGCC 

AGGAAGCCGT gcaccagagc aaaccgtcca cgggcctcct gcttgcttct ggtgacactg 

AGACCCCACA TGTCTGTATT CCTCACAGGG AAGTTGAAAA TACCATCTCC GTGGATCGGC 
CAGTGGTGAG TGGTTTAGAT CTTCTGTGCG AAAAGTCCAG AGGGTCCCCT TCCCTGACCA 
TGCAGGGGAC AGATGGTGCA GGGGAGAATG GGCACTGGCA GAGGGAATGG GAGTCTGGGC 
TGTGCTGAGC AGTCCCTCCT TGGCACTGCA AATCCTACTT TGGCATGGCC AGAAGTAATC 
GGCCTTAAGC ACCGGGGGCC ATTGAGGCAG TTCAGGGGCT GGGAAATATG GAAGAGGGTC 
CTGGAAAGGA GAAGCAATTT GAACAATCGG AGGGAACAAG GCCACAGGAA GGGATGACAA 
GAGCCGCAGC GAACACTGGA TTCTGAGACT GGATAACATT GGATTTCACA CATAGAGAAA 
AGAAAGTAAG CTGGTGCCGG ACCTGGTGTT GACACTTGGA TCCTCCACTT ACCAGCGGGG 
TGACCTGGAC AATTTCTGTA ATCCCTCTCA CTCAGTTTCC TACTCAGTAA AACGGGGATG 
ATAATGTGCC TTGCAAGGCT TTTGTGAGGC TTCATCAATG AGGTGATGTA TGTGAAGTGT 
CTGGCACAGC ATGGGCACTC AAACAGAGGT GCTTTTTCAC ACTTTACACC TTACAAGGTA 
CTTTTCACAT GTGTCATCGC GATACTTGCA AGGTTGCTGA GAGGTAGATG GGGTTATAAT 
CCCTGGTGTT CAAGAAAGGA AGCAGAGGCT CAATGGGGTT GAATGACTTC TCTGAGTTCA 
CAGAGCTCAG TAAGTGGCAG GGTTTGGAAC TCACATTCAG ACTCTCTGAC TCCAGACTTA 
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GGTTTTTCCG 
GCACAGAGTG 
TTGGGCTGTA 
GGACAAATGC 
CTGAGCCAGT 
GTAGGTGCGT 
CAGGACCTGC 
GTGCAGGGTT 
TTGGGCTCCA 
ACACAGGGAG 
GGGCAGCCTG 
CTCAAAAATT 
AGCCATGTTT 
ATTTTTGAGA 
CTCACTGCAA 
CTGAGATTAC 
GGGTTTCACC 
TCAGCCTCCC 
TTTTTTAAAA 
GTAAGAGTAG 
AAGAAGAATT 
TTTCTGGACC 
CTGGCGGTTC 
CAGAAAAAGA 
TGCATATGTA 
CAGACTTGCC 
TGACAACTAC 



CACCTCCACG 
CTGTGTGTTG 
GTCAGCTGAC 
CCCTCTGAAC 
GCCAGGTCTC 
GGGGATCTGT 
AAACCCAAAA 
TTTGATGTCC 
GTGTCGAGGG 
CAGGGCCCTT 
CCACAAGCTG 
TTTAAAAAAT 
TTGGTGTTTT 
CGGAGTTTCA 
CCTCCGCCTC 
AGGTGCCCAC 
ATGTTGGCCA 
AAAGTGCTGG 
AAGGAAGAAA 
ATTATAAAAA 
CTCTTCTCCC 
CTGGAGCCCC 
TGAGAACTTA 
AAAAAACCAA 
TGTGCATGCA 
TCTTCCTCCC 
GATTTGCTGG 



CTGAGGCCAG 
GGCTCTGTGT 
AGTCCTTTGT 
TGTCTTCTGG 
CAAGTGCCTT 
TCTGGTCATC 
GCTTATGGGA 
CTGCACTGAC 
TCAAACAAGG 
TGGCTCAAGC 
GGG CTTTTAC 
ATTCTGTAAG 
AGTAACCAAT 
CTCTTGTCAC 
CCGGGTTCAA 
CATCACGCCT 
GGATAGTCCT 
GATTACAGGC 
GAAAACCTTA 
CAAAGTCAGA 
TTCACCCTCC 
ACCCCAAGCT 
CTTTTCACTT 
GGTAGGTGTG 
TGTGAAGTGT 
CCTCCTTCCT 
GGGAAGGCTA 



CCCCAGGCAG 
GTTGAGGAGT 
GCTCTGTGGG 
GCAGTGACAG 
CTGAATGACC 
TGGATGCTGG 
GCTGGCACGT 
ACAGTTGTCT 
AATTTTGGGG 
TGATAGTTGC 
CAAAGAAAAT 
TCAAAATCCA 
TTCATCTTTT 
CCAGGCTGGA 
GCAATTCTCC 
GGATAATTTT 
GAACTACTGA 
ATGAGCCAGC 
GCCAGAAGAT 
GCAGTCACTG 
ATGCCCCTTT 
AAAGACCAGG 
ATTCTGCATT 
TGGGTAGAGA 
GCATGTGTGA 
GAGCTTCTGC 
CGTGCCAAGC 



TGAGAAGCCC 
CTTGTGACTG 
GATGACGTAG 
TCATGGTCAT 
ACAGGCGATT 
TCATCGGGTG 
CACGTGAGTA 
GCAGTTCTCC 
CGTGGGCCAA 
CGCAGGGATT 
CTCCCTATGT 
TTGTTAGGTC 
TATTATTTAT 
GTGCAATGGC 
TGCCTCAGCC 
TGTATTTTTT 
CCTCAGATAA 
ACGCCCGGCC 
CTTTTTCCTT 
GTGTCTGGGC 
TTGGCTCCAT 
ATACAGGGAA 
TACTGTTTCC 
GCATGAAGTG 
GCTCATATGC 
TGGGGCCGAG 
ACTCTTTTAG 



AAAGTCCGAA 
CCTTGGGGCT 
GCCAATGGGA 
AATCCTGACC 
GGTTTTAGTG 
CAGTATTGAT 
GAGCAGGCAG 
AATTTGACAT 
ATCTGGGAAG 
ACCAGGCCCA 
TAAATGCTTG 
AGTTTGAGAG 
TTATTTGTTT 
ATGATCTCAG 
TCCTGAGTAG 
AGTCGAGATG 
TCCGCCCACC 
ACCAATTTCA 
GCCATATGCA 
ATGGAGGAGA 
GTGATTCAGA 
GCCACAACCA 
TTTTCTTATG 
TGTGTACTCA 
ATCCATGCAC 
CGTGCAGTAA 
GTGCTTTCCA 
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TGATTAATTC CTTCCTCACA ACAGCCCTAT GAGATTAGTA CTATAACTAT CCCCATTTTC 

AGAGGGAGAA aaggtacaga CTTGACTAAC ttgcccaagg ccacacagcc agagaggggc 

AGAGCCAGTA CTTAGAGCCA GGCAGTCTGG GTCCAGAGTC CGTGTCCTGA ACCACAAGAG 
GC CATC AT AC GCCATCACAT TTGGTGCTAG CATTTCTGGT GGTGCCTGGT GGTGATGGAT 
CCATCACAGG GGTCCTCCAG GTACTGGTGC TGGCCCAGAC CAGAGCTGAC ACTCCTCAGG 
CACTACCACA TTCCAGGCAC TGTGCTTGGG GTCAGTCCCT CTCTTTTTTT TCCCCCCCAA 
TTATAACAGT ATCTACAAAG TAGGTGCTGT TATTTTTCCC CTTTCACAGG TGAGATAGAC 
TCAAAGAAGT GAACTTGCCC AAGGAACAGA ACTAATGAGT GGGGAAAATG GAACTGGAAA 
CCATGTCTGT TTACTCCAAA ACCTGTGTTT CTTGCCCTCT TTCTCTGATG CCAGCCCCCT 
ACACTTCAAG GCCTGTGTTG TCCAGACCCA CACTCGGGCC TGCCAGTGTG TGCCTGGCAG 
GGATGCTCCA TGGCCACACC ATATCCATCC TACACATCCC CCCTCAGACT GTGACCTCCA 
TTTGCTCTGG GATCCCCACA AGCTTCAGCT GCTTCAGCAA GACACTGCTT AGAAGGCAGA 
GCAAGCCAAG GCCTCTGGGG CCTGCTGGGA GCCAAAGCTG GGGAGCCGTT TCCACCGGTC 
TATCTGCTTG AGCTGTCCTA GATGAGCAGC ATGGAAGGGC AGTGGTGCAT GAGTCCAGGC 
GGGCTGCTTT TCTGCTCCGA GAGGCTCTGC CTGCCCAGTT GTTCTCTGCA TTGCAGCCTC 
AATCCCCACA GCCTTGCCTT CCCCCGGCTT TCCCTACAGG TGCACCGCAT CCACAGTGTT 
GGCACCATGC AGCAGCCGCT CTCCGTCCTT TTCATATCCT TGTCACTTGC ACGAGCATGT 
CTTGAAAATA TCCCTTGTTT GTGTAGCATC TTAAATGTTT TTGCAGTATG ATTTTGCATT 
CAGTATCTCA TTTGATCCCC ACAAGAGCCC TATGAGGAGG GAAAGCAGAT TTTACCATTA 
AAGGATGAGT AAACTGAGGC CAGAGAGGAT ATTTTTGGTT TTTTTTGAGA CAGTCTCACT 
CTGTCACCCA GCCTGGAGTG CAGTGGCTTG ATCTTGGCTC ACTCCAAGCT CCACCTCCCA 
TGTTCACACC ATTTTCCTGC CTCAGCCTCC CAAGTAGCTC GGACTACAGG CACCCACCAC 
CACACCCAGC TAATTTTTTT GTATCTTTAG TAGAGATGGG GTTTCACCCA GTTAGCCAGG 
ATGGTCTTGA TCTCCTGACC TTGTGATCTG CCTGCTTCGG CCTCCTAAAG TGCTGGGATT 
ACAGGCGTGA ACCCCCCTGC CCGGCCAGAG AGGATATTTC TTAATGAGGG GCAGGGCTGG 
GATTCCAGCC CAGTGTTCTG ATGGCTCACC CACTGACCAT TCCACTAATC CGTGTCCTTT 
TTCAATCTAA ACTTTCAGGG TTGTAGAGGT TCCTTTGAGG TGCCTCAGTA CTTCCATGGT 
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GATGTGGGGT CTGAGGGCCA AGAGCTCTGT TCTCATTAAT CAGAGAAGCT TGTGTTTTTA 9360 

AAAACACCAT GTTTACTGCA GGAAATTTAA TTGGACAGTG TTTCCATCTG GAAAAAAAAA 9420 

AGTCTACAAA ATACTTGACA ATCACTGCAC TAGATCATGC TGCTTTTAGC ATTCTTAGCA 9480 

TTTCACGTGC TGAGCTCTCA ATACTCTACC ATGAGGAGGG ATGGAGTGGG TATGAAAAGA 9540 

TAAAGAACTG AAGTCACACG GCTTGTCAGT GGCAGAGATA GAGCTTGAAC CGAGGTTGAA 9600 

GAGCTCCCGC CTATTCCTTT CCTCTTCTCA CTGGATAAAG CTGCTCCAAG AGAGGTGCTG 9660 

CCTCAGTGTG CCTGTTCAGA CTGTAATCCT CCCTTCCTTC CTGCCTCCTC CCTCCTCTCT 9720 

CCAGCCCATC ATCTTCGTTT CGGACAGAGC AAACAGCAAC AAGGAGCTGG GTGTGGACCA 9780 

GGAGTCAGAG GAGGG C AAA G GCAAAACAAG CCCTGATAAG CAAAAGCAGT CCCCACAGGT 9840 

GTCTGGGCAT GTGGCATGGG TGGGGTGGCC AGCACGCTAC AGGGGCTTCC TATGCGCTTG 9900 

GGATACACAG GGGCTGGAGG CTTCCCAGGA GTTTGTCTTG AACATCTGGA GGTTTGAATT 9960 

TGTCCCACTG ACCTTTTCTT TCAGCAAGTT CCCCTGAAAT TTGGGCTGCT GCTTGGGTGA 10020 

ATATCCCAGG ATGGGGGTTC CATTCTAGGA GTGGACTGGC AGGCTGAGCC TCCCATGGAG 10080 

CTGATCCAGC CAGGATACAG AGAAGGGGAG GCAAAGGCTG AGACAGAACC AGCTTGAGAG 10140 

CGGAGGCGCA ACTCTTGTCT CCTGGTGGCC TTGAGCATTT CACAATAGGG GGATAAAGGA 10200 

TAGGAGCAGA AAAGTGGGGC TGACTTCAGA AATGGGGTCC TCTAGAGCTC ACGGGAGGGT 10260 

GTTAGATTGG AGTGGGAGCT TAGTGGAGGT GAGCCTTAGA GGCAAAAGTC TCCAGACCAA 10320 

TCCAGGCCCC CTCTTCTATC CGGGGGCCCC TCTTCTATCC AGGGCCCCTC TTCTGTCTGG 10380 

GAGCCCCTCT TCTATCTGGG GCCTCATGCA GTGGGGCCTA GGGGAGGTTC TCTGAGGACT 10440 

TGGCCTTGAT GACAGGGTGG CTGGAGGAAT CAGAACGGTC AGACCTTCTT TGACCTGCGG 10500 

GCACCTTTAG TTGGAATGCT CAGGCCTGGG ATGGTGGAGG GGGCTCTTGC AGGTGGGGAC 10560 

TGGGGTGGCG GGGAGGAGGC TGTATGGCCG CCATATCTCC TTTGGCTGGG GGCGTCAGGG 10620 

CTGGAGAGGT GTGAAGAGTC CCTGAGGCCT CGATGCATCT CACTCCAGCT CACCAGGTCT 10680 

GCATTTGCCC GTCCCCAGCT CCTGCTGCCA CCCCCGGCCG TTTTAGGCAC TTGGCTCCCT 10740 

TGGCCCAGAG GAGCTTGCCT CACAGGCCTG TGCACCTCTG ACCCCTGTGA ACCAGTTTTC 10800 

CTTTGTGCCT CCACAGCCAC AGCCTGGCAA CTCTGATCAG GAAAGTGAGG AACAGCAACA 10860 

ATTCCGGAAC ATTTTCAAGC AGATAGCAGG AGATGTGAGT ACCTCCAAGC CCAGGACGCC 10920 
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CACAGGTGCT TCCTTCTCTC CTGGATTAAC TGCTCAGATT ACCAATTATT TCATTATTGT 10980 
TTGGTAGAGG TCACTTTGGA CTTCGGTGGA GCCAGGGGAT GTGTGCGTAG CACACAAATC 11040 
CACAAGCCCT TGAGTTTTGG ACTGCCACGT CTGCTGGGGG GCTCAGAGGC CTTTTTGCTC 11100 
TGAGCTGCCC ACGGTGGTCC TGATAGCTGA GGTGCAGTAT CTGGCCCCCT GTCTTCCTCA 11160 
GAAAAGCCCC AGCTTCCCAT GACATAATAG CACCGACAGG GATTTTACAA ACACAGCCAG 11220 
GTGGAATTTG TTTTGCAAAG TGTCCGCGCC AGGAGCTGCT GTACTCCTGA ACCATGACCC 11280 
TCCTCTCCCT TCCTCCTCAG GACATGGAGA TCTGTGCAGA TGAGCTCAAG AACGTCCTTA 11340 
ACACAGTCGT GAACAAACGT GAGTTGCTCA AAC C AAATGG GGGTGGGGTG GGTGGGGAGT 11400 
CCCGTTGTCT CAAAGCAGCT CCTCACTCTT CTCCATCCCC CCAGACAAGG ACCTGAAGAC 11460 
ACACGGGTTC ACACTGGAGT CCTGCCGTAG CATGATTGCG CTCATGGATG TATCCTTCCT 11520 
GCCGCCCCTT CCCGACCCTC TGTCATCAGC CCACGGGGGC CAAGGCAACA TACAGGGTGC 11580 
CCAGTCAGGC AAAGGGCCCT AATTTGTGCC CAGGGAAACT TAAGGAGACC CTGATTCAGA 11640 
ACATCTTGGA TACTCGTCTG AAAGGGGTTG TTAGAGGCGG AAGGGGAGGA TGTTGGGTTG 11700 
TAACTGCCCT AACCCCTGTG CTTCTCTCAG GCCTGGGATC CTGCCCAAGC AAAAGTGGTC 11760 
CTTAGGAGAG CGGCTCCTGG GTTACAGAGT AGGCGCAATC TCTGACTGGT GGTGGAGTGG 11820 
AGGGGAGGGT TAAATAGTAC AACAGGGCAG TGGGTAGGAC AGCCCGGAGT CTCCTAGACC 11880 

CTCCCTCCAA ATCCAGGGGG ATTTTGCTGT GTGCTGTGTA GCCCTGACCT CCCTCCTCCA 11940 

GACAGATGGC TCTGGAAAGC TCAACCTGCA GGAGTTCCAC CACCTCTGGA ACAAGATTAA 12000 

GGCCTGGCAG GTGGGAAGAG AAAATG AAG C GTGGGAGTCA AGAATGGGGT TGATTTGGAG 12060 

ATTCAGTGTG TGACCTCCAT CCTCAAATTT TCTATTGCCA GAAAATTTTC AAACACTATG 12120 

ACACAGACCA GTCCGGCACC ATCAACAGCT ACGAGATGCG AAATGCAGTC AACGACGCAG 12180 

GTGCTGAGAA GGAAGGGGTG TCAGGGATGT GGACCCGAGA CGGTGGGAGC AGGAATGGCA 12240 

GGGGACTAGC TACTAGGGCC CCACTAGAGA AGGAGAGGGA AAGGGCTTCT CACTTTCCCT 12300 

TCCCAGGTCA CAGAGTGTCC GAGAGGCAGG GAAAATAGAA GACAGGCCCA AGGCCTCCAG 12360 

CTCCACGTCC ACCTCTAACA TGGTCCCCTC CACAGGATTC CACCTCAACA ACCAGCTCTA 12420 

TGACATCATT ACCATGCGGT ACGCAGACAA ACACATGAAC ATCGACTTTG ACAGTTTCAT 12480 

CTGCTGCTTC GTTAGGCTGG AGGGCATGTT CAGTAAGTGG GAGACGGGGG CTGCCCTCTG 12540 
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CTCTCTTGCA GGGGCAGTTG TGGCAACAGG CATCTCACCT GATAATCTCC AGTCTGCTCC 12600 

ATCCAGGCTG AACAAGGGCC AATGACCTCT TTAGGCCCAG AATGGGATGG CAAAGGGAGG 12660 

GTTACTGGTG ATTCTCTGCC TGCACATCTT TGTGCTGATG AGGGACAGCA CTGGGCACAC 12720 

GGTCCTCTGA GGG GAAGTTA CAGTAGTAGA GGCGGAGTGC GCCTGTAACT GGCCTCTGGC 12780 

CTGTGCATTC TTTCACAGGA GCTTCTCATG CATTTGACAA GGATGGAGAT GGTATCATCA 12840 

AGCTCAACGT TCTGGAGGTA AAGCATAGGC ACAGCACATT CCCCCTACAC ATTAAAACTC 12900 

AAGGTGGAGG GGTCAACGGG GCGGACTGGA CCCAGGGTGT GCTCCTCATT TCCACACAGT 12960 

GGTGGAGGGA AG GG AT AGG A ACAGAACATG GAGGGAGGCT CAGCAGGCTC CCAGGACACA 13020 

TGCACTTGAG GCCCAAAAGG ACCTCTGCTC CCCCAGTCAC TTGATGCGGG AAAACATGCA 13080 

CCTTCTTAGG GAAGATCTAG GAGAAAGGAA ACAGTAAGCC ACTGCTTCTT GG AAAATCTT 13140 

CTGGGGGTCT GACCTGCTGG GACTGTTCCC TTTCCTCTTG CCCCGTAAGA TTCCTAGGGC 13200 

GGGGGGGGGG GGGGGTC ACT CTTTTCTGAT CTACATTCTG ATCTTGGGAC TTCTTTCAGT 13260 

GGCTGCAGCT CACCATGTAT GCCTGAACCA GGCTGGCCTC ATCCAAAGCC ATGCAGGATC 13320 

ACTCAGGATT TCAGTTTCAC CCTCTATTTC CAAAGCCATT TACCTCAAAG GACCCAGCAG 13380 

CTACACCCCT ACAGGCTTCC AGGCACCTCA TCAGTCATGT TCCTCCTCCA TTTTACCCCC 13440 

TACCCATCCT TGATCGGTCA TGCCTAGCCT GACCCTTTAG TAAAGCAATG AGGTAGGAAG 13500 

AACAAACCCT TGTCCCTTTG CCATGTGGAG GAAAGTGCCT GCCTCTGGTC CGAGCCGCCT 13560 

CGGTTCTGAA GCGAGTGCTC CTGCTTACCT TGCTCTAGGC TGTCTGCAGA AGCACCTGCC 13620 

GGTGGCACTC AGCACCTCCT TGTGCTAGAG CCCTCCATCA CCTTCACGCT GTCCCACCAT 13680 

GGGCCAGGAA CCAAACCAGC ACTGGGTTCT ACTGCTGTGG GGTAAACTAA CTCAGTGGAA 13740 

TAGGGCTGGT TACTTTGGGC TGTCCAACTC ATAAGTTTGG CTGCATTTTG AAAAAAGCTG 13800 

ATCTAAATAA AGGCATGTGT ATGGCTGGTC CCCTTGTGTT TTGTTGTCTC ACATTTAGAT 13860 

ATCAGCCATG CATGACTGAA TGGCTTCCAA TCATATACTC ACCTATCACC TACAAGAGAA 13920 

CAATGAAAAA CACACACAAA AACAAAATCT TGAATTTTGT AATCATGCCT ATTGCTATTT 13980 

CTTGAGCATA AGAATGGCTC AGATACTTTC CAAGACATAA AAGGAAGGCA GAGGAATAGT 14040 

TGTTGCTGTA AAAGACATCA AGAATAAATG GGGTCATGTA CAACGGGAGG GGCCGGTTAC 14100 

CTGAATAATG GAGTGGAGAT TGAGCTATCC TAGCTCCTCT GCTCACTAAC TGACCTGTCG 14160 
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